My Thoughts on the Serverless Announcements at re:Invent 2018

Posted by Yan Cui on December 20, 2018

re:Invent has come and gone, and it’s another year of exciting announcements. Unsurprisingly, serverless was hot on the lips of everyone at the conference.

img

There were a number of updates for AWS Lambda, the most important of which were:

  • Layers
  • Custom runtime support
  • ALB support
  • WebSockets for API Gateway and Lambda
  • Firecracker

These have been covered extensively by other blogs, so we will touch on them only briefly. However, serverless is so much more than just Function-as-a-Service (FaaS), so let’s talk about some of the other serverless announcements. I for one was very impressed by the database and AI announcements at re:Invent this year.

But first, let’s quickly catch you up on the Lambda announcements.

The Lambda announcements

Layers

Lambda Layers lets you share arbitrary code and data across multiple functions, even across multiple regions and accounts. It provides an easy way to share libraries and static data across your organization and addresses a common concern about Lambda users.

img

I think it’s a great way to distribute dependencies such as the MaxMind database — third-party dependencies with a stable API but frequent data changes, so the updates are considered fairly safe. But I wouldn’t consider using Lambda Layers as a replacement for existing package managers such as NPM. Most dependency updates contain behaviour or API changes, and need to go through the usual security scanning and integration testing before they should be deployed to the live AWS environment.

Custom runtime

Lambda custom runtime allows you to create a custom language runtime for Lambda.

img

This is a very cool feature for Lambda, especially for developers who are itching to use their favourite language with Lambda. But as Paul Johnston rightly pointed out, creating your custom runtime should be considered as a “last resort”.

ALB support

ALB support for Lambda allows you to mix Lambda-based HTTP handlers with other container or EC2-based handlers under the same load balancer. This simplifies how you manage your infrastructure and makes it easier for you to choose the right tool for the job. However, from the cost perspective it doesn’t appear to offer any advantage over API Gateway.

img

WebSocket for API Gateway and Lambda

WebSocket for API Gateway gives you another way to implement real-time applications with Lambda. Previously you had to choose between AppSync and IOT Core, both are still valid choices given the context of GraphQL and IOT respectively. However, this new feature gives you the option of using plain WebSockets with a REST API, and removes a lot of the extra complexity that comes with using AppSync and IOT Core.

Firecracker

There was a lot of excitement around the Firecracker announcement at re:Invent, and rightly so, as I think it will have a huge impact on Lambda and Fargate in the future. Once Firecracker is rolled out to the entire Lambda fleet it should have noticeable impact on both cold and warm performance. More importantly, as Firecracker changes how networking is done at the hypervisor level it will have an even bigger impact for functions running in VPCs. Marc Brooker hinted at massive reduction to VPC cold start time for Lambda during the SRV409 session I attended.

Database Announcements

Compared to previous years, I have been especially impressed with the database-related announcements at this year’s re:Invent. To be honest, I felt AWS had been falling behind Azure and GCP in terms of its database offerings in recent years. DynamoDB in particular, had started to look dated compared to the likes of Azure Cosmos DB and Google Spanner.

DynamoDB

I was really happy to see that the DynamoDB team has addressed some long standing complains from its customers, and given DynamoDB a much needed makeover. There were two big announcements for DynamoDB:

DynamoDB Transactions

With DynamoDB Transactions, you can make coordinated, all-or-nothing changes to multiple items both within and across tables. They provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB.

Previously, transactions in DynamoDB has to be orchestrated from the client, which introduces additional complexity to our applications. It was also impossible to guarantee data consistency across the whole transaction as conditional checks are carried out individually. Lastly, both the update and the rollback operations are subject to throttling limits, individually. It means that you have to pay a great deal of care towards error handling, or risk leaving your tables in an inconsistent state if rollbacks cannot be completed in their entirety. This often necessitates the use of patterns such as the Saga pattern.

The new DynamoDB Transactions feature addresses all of the problems above, and makes it easy for you to maintain data consistency in your application. I cannot overstate how much a game changer this is for DynamoDB users.

DynamoDB On-Demand Pricing

On-Demand Pricing is a new pricing model for DynamoDB where you no longer have to worry about provisioned throughputs. Instead, you only pay for read and write request units that you actually use, plus the usual data storage costs.

Notice that these are measured in request units, so the size of the payload matters. One read request unit represents a read for an item up to 4KB. Reading an 8KB item would therefore consume two read request units. Similarly, one write request unit represents a write for an item up to 1KB.

At $1.25 per million write request units and $0.25 per million read request units, it’s possible for the new pricing model to be considerably more expensive. However, it’s important to consider the following factors:

  • You can’t predict your throughput precisely, and the burst capacity is often not enough to deal with sudden spikes in throughput.
  • The built-in auto-scaling mechanism does not react quickly enough to spikes in throughput, nor does it scale up aggressively enough (more details here).
  • The built-in retry behaviour in AWS SDKs can often exasperate these spikes.
  • You have to employ defensive coding techniques to handle the ProvisionedThroughputExceededException all over the place.
  • The impact of DynamoDB throttling to user experience is often more expensive to repair than the cost of over-provisioning the tables to begin with.

Given these contributing factors, most teams would use reserved throughputs that are many times their actual throughput. In the worst case, I have seen a DynamoDB table with 3000 provisioned write units and a peak consumption of ~150 write units per second.

Which is why, in practice, switching to the On-Demand Pricing is not likely to cause your DynamoDB costs to increase dramatically. But it will free you from worrying about DynamoDB throttling, and save you the engineering efforts you have to put in to mitigate them otherwise. If you are using DynamoDB then I think you should default to using the On-Demand Pricing unless you have a strong reason not to.

Aurora

The big announcement for Aurora was the introduction of the new data API for Aurora Serverless. It addresses the issue that Aurora Serverless still required persistent connections and was therefore difficult to use from Lambda functions.

While I was initially excited about the announcement, analysis by Jeremy Daly revealed some significant issues with the current implementation:

  • Average response time for a query is around 200ms!
  • Response format is too verbose.
  • No support for parameterised statements.
  • No support for IAM roles.

As such, I don’t think the new data API is ready for production use just yet. Hopefully these issues are addressed by the time the feature becomes generally available.

Amazon Timestream

A time series data is essentially a sequence of data points stored in time order, such as the temperature of London, or the price of the MSFT stock, or the location of an autonomous self-driving car. Time series databases have long been the bedrock for application monitoring solutions such as Prometheus.

I have worked in several business domains where I had to use relational databases to solve the problem of time series data, and scalability and performance were always an issue. With Amazon Timestream we have a fast, scalable, fully managed, pay-per-use time series database at our disposal.

The service is still in preview, so I don’t have any hands-on experience to report. Based on the information on its product page, you will have a number of built-in analytics functions - smoothing, approximation and interpolation. It also integrates with the new Amazon Forecast service, presumably as the time series datastore for Forecast. I for one am very excited to see how this service evolves and whether it lives up to the hype.

Amazon Quantum Ledger Database (QLDB)

You have probably heard the joke that “Blockchain is just a very slow database”, and there are certainly some truth to that. But if you take away the distributed consensus aspect of blockchain, the underlying ledger technology can be very useful in its own right. The new Quantum Ledger Database (QLDB) is an immutable, append-only, fully-managed database where transactions can be cryptographically verified. It also has a SQL-like syntax, and should be easy for developers to migrate from existing relational databases.

QLDB can be very useful for workloads where you have strong audit requirements, such as financial transactions for banks, or patient care records in the healthcare industry. It can also be useful for recording control plane changes in a system (I’m making a distinction here from end-user facing business applications), which is exactly how AWS uses this technology internally. Anytime an EC2 instance is spawned or stopped, those control plane changes are logged in their internal version of QLDB.

While I’m sure it would be useful for building blockchains, given that AWS offers a managed blockchain service that supports both Hyperledger and Ethereum, you probably wouldn’t need to build one from scratch yourself.

Conclusions

That’s it for our roundup of the most noteworthy Lambda and database announcements at re:Invent this year. There are many other serverless-related announcements that we weren’t able to cover, such as:

and much more.

My overall feeling towards this year’s serverless announcements is one of huge optimism. AWS continues to move further up the value chain in terms of its ML offerings, with even more managed AI services that address specific business needs. These services would have a massive impact on how we deliver business values quickly and address common user needs with robust technologies that Amazon itself is built upon. I’m also excited by the improvements to the Lambda platform and the future improvements that Firecracker can bring.

But the most important updates that can make an immediate impact on so many applications today, in my opinion, are the new DynamoDB features. DynamoDB is such a core service and the de facto database option for many AWS customers. The new DynamoDB Transactions and On-Demand Pricing features address two of the most common issues DynamoDB customers experience. Both of these are game changers.

With that, I’m looking forward to re:Invent 2019 already! Happy holidays everyone!