Not so FaaS

Struggling to migrate your workloads to FaaS? Read this guide to explore when you shouldn’t

Posted by Yan Cui on January 14, 2019

Serverless is everywhere these days, and Function as a Service (FaaS) platforms, such as AWS Lambda, are the most prominent examples of serverless. While it’s easy to get carried away with all the excitement around them, we should remember that for every successful headline-grabbing adoption story, there is an unsuccessful attempt to adopt this new paradigm.

Unfortunately, many companies struggle to migrate their existing workloads to FaaS. There are still many feature gaps in the FaaS platforms, and those attempting to use them often misunderstand their limitations.

In this article, we will explore some scenarios in which FaaS is not a great fit for now.

FaaS Is Not a Silver Bullet

FaaS is a powerful paradigm, and it can drastically simplify the way developers build and run cloud applications. But, as with all things in technology, it has trade-offs and is not the right solution for every problem.

Too often, newcomers to a paradigm become infatuated by its possibilities and when your whole world is a hammer, you start looking for nails. This happened when we discovered NoSQL—all of a sudden, MongoDB and Redis were used to solve every database challenge, often with disastrous results. It also happened when we learned about functional programming—all of a sudden, it was paramount that code be functional and pure, even when the performance trade-offs were unacceptable.

Sadly, these attempts often end the same way. At some point, developers hit the platform limits and fail. When this happens, many simply blame the paradigm for their failures and give up. Those that persevere slowly appreciate the paradigm’s trade-offs and work out where it actually fits.

img

When to Avoid FaaS

FaaS adoption has been gathering pace over the last two years. Unfortunately, history is already repeating itself, as stories of failed attempts to adopt FaaS are starting to emerge.

Here are some common reasons why these adoptions didn’t work:

  • Failure to understand platform limitations (cold starts, max invocation duration, scaling thresholds, etc.)
  • Failure to understand the cost implications when operating at scale
  • Failure to understand the operational constraints imposed by other dependencies, such as database choices, as well as language and library support (Do you need a specialized library that is only available in a particular language, for example?)

Let’s consider four scenarios in which FaaS is generally not a good fit.

When Consistent and High Performance Is Needed

Cold starts are one of the most common complaints in the FaaS world. When the platform needs to (internally) spawn a new container in response to an increase in traffic, the first invocation on that container will be much slower than average. In the case of AWS Lambda, a number of factors can significantly impact the duration of a cold start: language runtime, memory size, the amount of initialization logic (including what the application dependencies do in userspace), and whether the function has VPC access.

Developers can reduce the cold start time in a number of ways. The most effective is to write functions in Node.js, Python, or Go (as these languages all have low start-up overhead) and to avoid using VPCs. However, even an optimized function will likely experience 300-500ms cold start time in production. This is good enough for most web applications but is unacceptable for applications where consistent and strong performance is required. For example, real-time multiplayer games often require 99th percentile latencies below 100ms.

That said, specialist platforms, such as Binaris, focusing on performance and cost reductions, can bridge some of these gaps. Binaris deploys an HTTP endpoint for each function out-of-the-box, removing the need for an API Gateway in many cases. This removes another layer in the application that can introduce additional latency overhead.

When Persistent Connections Are Required

Many workloads require a persistent connection to a “server.” For example, a developer may want a persistent WebSocket connection to send real-time push notifications to connected mobile clients or may want to implement the subscriptions feature of GraphQL. In these cases, a connection to the server should be long-lived to avoid the overhead of creating a new connection. This should set off alarm bells right away, as functions are short-lived by design. In the case of AWS Lambda, a function invocation can run for no more than 15 minutes.

While it’s possible to work around this timeout limit by writing a function as a recursive function, or by exploiting internal implementation details to bypass the timeout, neither are recommended approaches because:

  • Recursive functions are notoriously prone to accidental infinite recursions, which can have a big impact on cost.
  • Platform implementation details are not set in stone, and they can change without notice.

When the Throughput Is Consistently High

One of the often-touted benefits of serverless is that you don’t pay for idle. Rather, with managed FaaS platforms, such as AWS Lambda, you only pay when your function runs. This can lead to significant cost savings if applications don’t have to run all the time (e.g., a cron job). With managed FaaS platforms, the cost is also much more granular. Developers pay for each invocation plus data transfer, as opposed to up-time (which always carries some wastage, since they typically run servers at below 65% utilization).

However, if an application experiences sustained high throughput, the cost of running it in FaaS can be significantly higher. Again, specialist platforms such as Binaris offer a much lower cost per invocation (currently a tenth of what AWS Lambda charges), which can offset the cost disparity at scale to some degree.

That said, it’s important to consider other cost savings with using a managed FaaS platform, such as AWS Lambda. You get decent scalability and resilience out of the box, and you no longer need the skillsets in your organization to look after the underlying compute infrastructure. Given the rising cost of engineers, the saving in personnel cost can be significant for even a small start-up.

To understand the Total Cost of Ownership (TCO) of your approach, you need to take into account the following:

  • Personnel cost for developing, supporting, and improving the solution
  • Operational cost - servers, networking, storage, and so on
  • Opportunity cost - engineering time spent on infrastructure that can otherwise be invested in building/improving the product instead

img

Imagine the scenario where running a high throughput application in FaaS is going to cost $10,000/month more than containers; should you move to containers instead? If you don’t have experience with running containerized applications at scale, and it would cost you $12,000/month to bring in that expertise, then a FaaS-based solution would still be cheaper! However, if you already have the required expertise, then you should definitely consider moving the application into containers instead.

When the Built-In Redundancy Is Not Enough

AWS Lambda offers multi-AZ functionality out of the box. This provides a good baseline resilience with no effort and no extra cost. Furthermore, you can extend that resilience beyond a single region by building multi-region, active-active APIs. However, not every serverless application is an API, and it’s not always economically feasible, or even possible, to replicate the data for your API.

One example is a data processing pipeline in which each record can be processed only once globally, or perhaps there are hundreds of terabytes of data that will be too expensive to replicate across multiple regions. In another scenario, user data cannot leave its country of origin due to legal requirements such as GDPR. If developers cannot build multi-region redundancy into their applications, they are constrained by the built-in redundancy from AWS Lambda and the other services used with it.

Consider a Kinesis function, where there is a polling layer that AWS Lambda manages and then forwards events to the subscriber function. If an issue arises in that intermediate layer, there is no real course of action other than waiting for AWS to fix the issue. We cannot easily move our business logic into a container or VM and continue working because we don’t have access to the internal states of the pollers. This lack of portability is a concern for applications that have a strict up-time requirement.

Conclusion

As FaaS platforms continue to evolve, it’s likely that more and more use cases will be catered to, and the rough edges will be smoothed out. In the meantime, FaaS boasts many desirable properties—the event-driven model, pay-per-invocation pricing, granular control of functionality and security, and so on.

Still, at the end of the day, the priority should be to deliver great user experience and ensure that applications are economically viable. If a certain FaaS platform cannot deliver what you’re looking for, don’t try to shoehorn the solution to fit the platform. That would be completely backward. Instead, consider your goal and the constraints you have to work within (e.g., latency, cost, and resilience) and work from there to decide if FaaS is the right solution for the job.