Automating Observability Metrics in Serverless

Published in

IOpipe Blog

4 min readOct 3, 2018

We talk a lot about observability in the context of serverless applications; when your serverless code is running on a remote server you know little to nothing about on its own, it’s important to be able to observe and visualize how your function code itself runs.

There is a plethora of data to be collected in the realm of observability; we’ve attempted to categorize them into the three pillars of metrics, tracing, and logging, but there’s still a lot in each of those columns.

Not to mention that instrumenting your code for observability can be difficult — it can be tricky to know what data will actually tell you how your functions and app are performing, and adding many lines of code to get the data can be daunting.

In this piece, we’ll look at the process of automating a piece of observability data — HTTP/S traces. We’ll discuss why tracing adds a good amount of depth to your observability picture, look at why tracing HTTP/S requests is a good example, and finally talk about how it’s worth thinking about automating more of your observability metrics in general.

Tracing: an integral piece of the observability puzzle

Tracing is one of the many ways to gather observability data from your serverless app, but I find it to be one of the more informative ones. This is because you can compare the time a piece of your function takes on many different levels — you can compare the time one task takes within your function, and you can compare traces from past invocations and see if a task has become more or less performant over time.

Properly implemented tracing can also help to pinpoint performance issues as they arise — you can look at a set of traces an immediately spot a troublemaker.

But again, it can be hard to know what to trace, and instrumenting your code for traces you’re not sure of can be a real issue. This is why automating certain types of tracing can be a great way to cut down on the work needed to collect relevant observability data without instrumenting your entire serverless application.

How observability metrics like tracing can benefit from automation

Figuring out what to trace is easily one of the most difficult parts of tracing serverless functions. Luckily, we can group many tasks into categories, and generalize which categories need tracing. If you can narrow it down further by isolating the conditions that mean a certain type of task is running, you can find the spots where that task runs.

If you can generalize a type of task that you will want tracing data for, and you can name the conditions that preclude that task, then you can automate tracing for that task in order to lighten your workload for generating observability metrics. Let’s look at an example of the kind of task we’re looking at: HTTP/S requests.

Automating the tracing of HTTP/S requests

In terms of runtime, HTTP/S requests are one of the most volatile tasks in serverless functions, because there are so many variables that can affect how long a request takes. They are also on the easier side to isolate in terms of determining when a request is being made. These two facts added together make HTTP/S requests a prime candidate for automated tracing.

How tracing HTTP/S calls benefits your serverless application

Tracing data on HTTP/S requests can help diagnose many problems within your serverless application:

If the HTTP/S calls are to other serverless functions, then you can follow it back to the source of the performance issue and manually trace the function until you find the problem.
If the HTTP/S calls are internal, and the request times out, even occasionally, it might be time to roll those functions together in order to cut down on overall runtime.
If the HTTP/S calls are to third-party services, it can make it easier to spot a problem caused by that service being down: if the trace for the HTTP/S calls to third-party services suddenly skyrocket or time out, you can pinpoint that the problem is within the call to the third-party service.

Auto-Tracing: an automated collection of observability data

As we build more and more serverless functions that are invoked more and more often, observability metrics need to keep up, and automating some it can certainly help ease the burden. That’s why we’re ecstatic to announce the release of Auto-Tracing on the IOpipe platform for Node.JS and Python. We’re starting with automatically tracing HTTP/S calls, but we’ll be adding more in the future.

To see which agents have Auto-Tracing capability and how to enable it, check out this post.

Excited to try this out? We have a 21-day free trial! And if you’d like to learn more, come join us in our community slack.