2017 was a massive year for serverless adoption, and 2018 shows no signs of slowing down. A recent analysis of AWS customers found that serverless adoption is growing 2.5x faster than container adoption.
“Among Amazon Web Services users, container adoption grew 246 percent during the fourth quarter of 201… In the fourth quarter of 2017, serverless adoption grew by 667 percent among the sites tracked…” —ZDNet
As an early adopter, I have learnt a thing or two about migrating an existing system to serverless and the challenges you face along the way.
Here is my 10 step guide to go all in with serverless!
1. Understand the problem
In the immortal words of Simon Sinek, always start with why.
What is the business problem that you are trying to solve by moving to serverless? Is slow feature delivery losing you customers and market initiatives? Are scalability and stability problems damaging your brand? Are you looking to reduce the cost of running your system? Or do you want to reduce the ops overhead on your developers to help them focus on creating business value?
Whatever it is, understand the business impact you want to create by moving to serverless. This helps you have more productive discussions with stakeholders and set realistic expectations.
Once you have identified the goal for moving to serverless, make it explicit and understood by the whole team. This shared understanding will guide future decisions when you need to make tradeoffs.
2. Identify low-risk business areas to start
Recognize that innovation and change requires learning. Learning requires taking risks and making mistakes.
Minimise business risk and avoid expensive mistakes. Start by migrating low-risk, non-critical business processes.
Avoid big-bang migration wherever possible. It might seem easier on paper, but making wholesale changes in one go is often risky. It is also more difficult to root out migration problems when many things change at once.
Migration projects are marathons and carry a degree of risk by design. You need support from the stakeholders to succeed, so avoid unnecessary risks and keep them on your side.
3. Prototype, learn, repeat
Serverless technologies like AWS Lambda offer a lot of composability. With that composability come choice and trade-offs.
For example, you can implement pub-sub using Lambda with SNS, Kinesis Streams or DynamoDB Streams. Which event source should you choose? Your choice can have a profound impact on scalability, parallelism, resilience and cost.
Unfortunately, there is no single “best” answer here, and there are many factors to consider.
Instead, build proof-of-concepts to learn and validate your assumptions. These proof-of-concept projects are your playground, where you can learn fast and fail cheaply. Do not confuse them with production code! Delete them after you have extracted maximum learning from them.
4. Continuous delivery
The sooner you apply continuous delivery (CD) to your serverless project the better.
Choose a tried and tested deployment framework. Resist the urge to create your own, it’s precisely the type of heavy lifting we want to avoid.
The Serverless framework is the most popular and takes care of most of the plumbing for you. It supports many cloud providers, and gives you many good practices out of the box. Its killer feature is the powerful plugin system that makes it very extensible. There are already many community-lead plugins available.
Your CD pipeline should be captured as code and version controlled.
Builds should be reproducible. Dependencies, including transient dependencies, should be locked down to exact versions. If minor/patch version updates can creep in between 2 builds, then the build is not reproducible!
5. Automated Testing
The serverless paradigm has a different risk profile to its serverful counterpart. You need to adjust your thinking regarding testing accordingly.
The constrained execution environment restricts how complex a function can be. The management of concurrency has been lifted into the platform. And you no longer need complex web frameworks anymore.
As a result, your code has become simpler than ever.
But, complexity doesn’t just disappear, instead it moves around. It is now in the configuration and security of your functions, and how they interact with external dependencies. These are the things that are most likely to break your application now. What we test should reflect this change in risk profile.
Unit tests should no longer be the workhorse of your testing strategy. They have lower return on investment (ROI) than ever before.
Instead we should focus on integration and acceptance tests.
Integration tests should exercise our functions locally and talk to the real downstream systems. The purpose of these tests is to test our code against our dependencies. Reserve the use of mocks and stubs for testing error handling in our code. Talk to the real downstream systems otherwise, to validate our assumptions of their behaviour.
Acceptance tests should exercise the system end-to-end without calling into its internal code. That means, if you’re testing an API then the tests should talk to the system via its HTTP interface.
6. Build observability into the system
Ship your logs to a log aggregation service where they can be easily searched.
Use structured logging with JSON. Complement log messages with contextual data that are useful for debugging and finding related logs. For example, include the order ID as attribute in every log message related to an order.
Disable debug logging in production. Instead, sample debug logs for, say, one out of every thousand function invocations.
Log an error message with the invocation event as attribute for failed invocations. This lets you capture and replay failed invocations.
Record custom, application-level metrics.
Create dashboards and alarms for key performance indicators. Connect alarms to an alerting & incident management system such as OpsGenie or PagerDuty.
Capture and forward correlation IDs through both synchronous and asynchronous event sources, and include the captured correlation IDs in every log message. This way, you can find all logs related a user action even when the action spans over multiple functions. You need this for debugging complex interactions inside a serverless application.
Also, sample debug logs for the entire call chain. Make the sampling decision at the edge, and pass the decision along as correlation ID. The receiving function would respect this decision and enable debug logging for the invocation as well.
Capture performance traces for your function invocations. For example, by using Amazon X-Ray with Lambda.
Follow the principle of least privilege. Each function should have only the permissions it needs, nothing more, nothing less.
Apply account level isolation. Each environment should have a separate account, so as to contain a security breach.
Use git commit hooks to stop account credentials from leaking.
Sensitive data such as API keys and credentials should never be checked into source control in plain text.
Use automated services such as Snyk to continuously scan dependencies for known vulnerabilities.
Sanitize and validate user input as well as function output. Sanitizing function output stops unintended data from leaks. If multiple functions are chained together then it also prevents problems from being passed down the chain.
If you know a function is no longer used, delete it. Even if they’re not used they will continue to exist as an attack surface. Unused functions are also more likely to become vulnerable over time due to lack of patching.
8. Continuous learning
You’re in production, congratulations! But don’t stop there. Keep experimenting, learn and iterate on your designs.
Share your learnings with other teams and with the broader developer community. Help establish a virtuous circle of learning and improvement for everyone.
Identify common patterns and cross-cutting concerns. Look for ways to standardize how you deal with these concerns.
Using middleware engines such as middy is a good way for you to standardize how you do things. For example, how you handle errors, or how to capture and forward correlation IDs.
Start building a platform to provide features all your teams would want to use.
10. Automate all the things!
Use the power of serverless to automate ops and security monitoring.
For example, you can adopt ChatOps using AWS Lambda with Slack integration.
You can use CloudTrail, CloudWatch Events and Lambda to alert on suspicious activities. For example, console logins at weird times of the day or from locations where you have no employee. Or alert on EC2 activities in remote regions that you’re not using. From what I hear, attackers are most likely to mine bitcoins with stolen credentials in Sao Paulo and Tokyo.
As I can testify from my own experience of migrating to serverless, the journey is a actually lot of fun. There are technical challenges to overcome, for sure. But these challenges are all solvable, and the platforms are getting better all the time.
The most difficult challenge most teams face is in adjusting their mindset. How we test our applications, how we operate them in production, and how we build resilience into them.
Depending on your starting point, you might face very different challenges during migration.
A serverless architecture is almost always a microservices architecture too. It means people migrating from a monolithic system would have to migrate to a new execution environment as well as a new style of architecture.
In the next post, we will shine the spotlight on this migration from monolith to serverless. We’ll discuss strategies for breaking down the monolith. And how to organize your codebase and functions as you move things into independently deployable functions.