Stratus3D

A blog on software engineering by Trevor Brown

Guide to Tracing in Erlang

I’ve spent a lot of time debugging Elixir and Erlang applications over the last 5 years and decided to share what I’ve learned here. I started writing this blog post in April of 2017 and wasn’t inspired to finish it until watching Jeffery Utter’s Debugging Live Systems on the BEAM talk about a month ago. If you’d prefer a more concise presentation of Erlang tracing I suggest you watch that talk instead of reading this.

The first thing to understand when debugging Erlang applications is that Erlang’s concurrency greatly influences how debugging is done. While Erlang does have a traditional debugger its breakpoints pause process execution and may cause other processes in the system to crash if they are waiting on a message from the process that has been paused. Tracing on the other hand can be done under any circumstances and provides just as much information as a typical debugger. Tracing is also significantly more flexible than breakpoint debugging as it doesn’t interfere with process execution and can gather data from multiple processes at once.

Traditional Debugging in Erlang

There are times when setting a few breakpoints is all that’s needed to diagnose a problem. Erlang comes with a debugger module that allows you to set breakpoints that pause execution and allow you to inspect the scope. The debugger includes a GUI that gives you full control over the debugging process.

In this blog post I am using my simple arithmetic module as an example. To debug a module you need to compile the modules you want to debug with the debug_info flag. Use erlc +debug_info to compile modules with debug info on. I ran erlc +debug_info arithmetic.erl. After compiling start the debugger by running debugger:start(). in an Erlang shell:

erl
Erlang/OTP 18 [erts-7.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.2  (abort with ^G)
1> debugger:start().

The debugger GUI should appear.

screenshot of debugger gui
Figure 1. Debugger GUI

In the GUI navigate to Module > Interpret…​ and choose the module you want to debug from the file dialog. Modules that have not been compiled with debug_info will be grayed out. Once the module is loaded you should see it listed on the "Module" menu. In the submenu for the module click on View. A window with the module source code should appear.

screenshot of debugger source view
Figure 2. Debugger Source View

Double click on any line you want to add a breakpoint too and then hop back in the Erlang shell you started the debugger from and run the code you want to debug. In my case I wanted to debug arithmetic:multiply/2 so I ran:

2> arithmetic:multiply(1,2).

The call will block and a debugger will list the call with a status of break in the main window. Double clicking on the listing will bring up the module source code with variable values and the debugger controls at the bottom. From here you can inspect the values and go to the next breakpoint by clicking the "Continue" button.

screenshot of debugger breakpoint
Figure 3. Debugger Breakpoint

The debugger has all the other features you would expect from a typical debugger. To learn more about it check the debugging page in the Erlang documentation.

There are a couple limitations to debugging in Erlang like this. First is that the module you want to run the debugger on must be compiled with debug_info flag (e.g. erlc +debug_info). Second is that the code being debugged will block when reaching a breakpoint. In many languages this isn’t a problem because everything is sequential and the rest of the program will wait until the user tells the debugger to continue. However in Erlang this can be problematic if another process in the application is waiting on a message from the blocked process. Unaware that the other process is blocked by the debugger it will timeout and may crash if it doesn’t get a message from the blocked process in time. Timeouts often result in one or both processes being killed or crashing. Tracing doesn’t block execution, so it is possible to use it to debug more complex scenarios involving multiple processes.

Tracing Basics

When it comes to tracing in Erlang it’s easy to get overwhelmed by everything. There are several different ways to run traces using the tools that come with Erlang, and then there are numerous other open-source tracing libraries available. We are going to first focus on the most basic Erlang tracing functions - the functions defined in the erlang module. Then we will look at the dbg tracer and one third-party open-source tracing library.

erlang:trace/3 and erlang:trace_pattern/2-3 functions

erlang:trace/3 is the tracing function that all other tracing tools are built on. We are only going to scratch the surface of what is possible with this function. The function takes only 3 arguments but there are many different options that can be passed in as the third argument, and there are several different values that are valid for the first argument.

erlang:trace(PidPortSpec, OnOffBoolean, FlagList) -> integer()

The first argument is either a literal pid, port ID, or atom specifying the processes and ports to trace:

PidPortSpec =
    pid() |
    port() |
    all | processes | ports | existing | existing_processes |
    existing_ports | new | new_processes | new_ports

The documentation explains what all of these options do, but I’ll highlight a few here:

  • all - trace all processes and ports.

  • processes - only trace processes.

  • new - only trace new processes and ports.

  • existing - only trace existing processes and ports.

Only processes are traced. All traces must select the processes to be traced before collecting any trace data. If you want to trace a function call that could be made from any process, you’ll need to trace all processes to collect the trace data for that.

tracing selection illustration
Figure 4. Visual cheatsheet of the possible PidPortSpec values

The second argument is simply a boolean flag that either turns tracing on or off for the specified pids or ports. Set it to true to turn tracing on and false to turn it off. The third argument is a list of flags that control how the tracing is done:

all | send | 'receive' | procs | ports | call | arity |
    return_to | silent | running | exiting | running_procs |
    running_ports | garbage_collection | timestamp |
    cpu_timestamp | monotonic_timestamp |
    strict_monotonic_timestamp | set_on_spawn |
    set_on_first_spawn | set_on_link | set_on_first_link |
    {tracer, pid() | port()} |
    {tracer, module(), term()}

There are a lot of options here and most of these options can be combined. The documentation contains a full explanation of each of these options but I’ll highlight a few of the important ones here:

  • all - all trace flags except tracer and cpu_timestamp. This option is a good default.

  • send, 'receive' - trace sends and receives for messages.

  • garbage_collection - trace garbage collection events.

  • set_on_spawn and set_on_link - extend tracing to any process that is spawned or linked to from a process that is already traced.

  • {tracer, Pid} - specifies the process to send the trace messages to. This is useful when you want to have a process running to handle trace messages and want to do something with them besides sending them to the shell process.

  • {tracer, Module, State} - specifies a tracer function that should be invoked with the trace messages. See erl_tracer docs for how to write a tracer module.

All events you want to record must be selected using trace flags.

tracing events illustration
Figure 5. Visual cheatsheet of some of the possible FlagList values

erlang:trace/3 may be used on it’s own but for tracing function calls erlang:trace_pattern/2-3 must be invoked to tell Erlang what functions you want to trace. erlang:trace/3 is used to specify what processes and ports are to be traced, and erlang:trace_pattern/3 specifies what functions are to be traced. Only when a traced process invokes a traced function does tracing occur. The first argument to this function is a tuple containing the module, function, and arity of the functions that should be traced. Here are a few example arguments:

% All functions in `Module` named `Function` of any arity
{Module,Function,'_'}

% All functions in module `Module`
{Module,'_','_'}

% All functions in all modules
{'_', '_', '_'}

The second argument may be true or false to turn tracing on or off for the trace pattern or a match spec. We come back to match specs shortly. The third argument is a list of options for trace patterns. There are only two I want to list here:

  • global turn on tracing only for function calls explicitly stating the module name (e.g. arithmetic:multiply(3, 4))

  • local turn on tracing for all types of functions calls regardless of how the function is invoked (e.g. multiply(3, 4))

See the docs if you want to read the documentation on the complete list of options.

Example Trace with erlang:trace/3 and erlang:trace_pattern/3

Now let’s setup tracing for one of the functions used by the arithmetic module I showed earlier:

% Turn on call tracing for all processes
1> erlang:trace(all, true, [call]).
40

% Trace all function calls in the arithmetic module
2> erlang:trace_pattern({arithmetic, '_', '_'}, true, [local]).
6

% Run the code in another process
3> spawn(fun() -> arithmetic:multiply(4, 3) end).
<0.126.0>

% Turn tracing off
> erlang:trace(all, false, [call]).
40

% Because we didn't specify a tracer process all trace messages got sent to the shell
> flush().
Shell got {trace,<0.126.0>,call,{arithmetic,multiply,[4,4]}}
Shell got {trace,<0.126.0>,call,{arithmetic,'-multiply/2-fun-0-',[1,0,4]}}
Shell got {trace,<0.126.0>,call,{arithmetic,'-multiply/2-fun-0-',[2,4,4]}}
Shell got {trace,<0.126.0>,call,{arithmetic,'-multiply/2-fun-0-',[3,8,4]}}

You can see the shell received a trace message for every function invoked in the arithmetic module. Because we passed in the local flag to the trace pattern even the anonymous function calls inside the lists:foldl/3 loop were reported. You can also see function calls to other modules were not reported. We don’t see the call to lists:foldl/3 or to lists:seq/2 because the trace pattern only specified the arithmetic module. When the flush() function is invoked in the shell it prints Shell got followed by the message received. Different trace events generate messages with slightly different formats, but all trace messages are tuples with the first item being the atom trace. Since we are only tracing function calls you can see all trace messages have the same format:

{trace, PidOfInvocation, call, {Module, FunctionCalled, ArgumentsInvokedWith}}

The Erlang documentation for the erlang:trace/3 function lists all trace message formats.

Another Trace with erlang:trace/3 and erlang:trace_pattern/3

To demonstrate a few more features of erlang:trace/3 I’m going to use a slightly more complicated example. Instead of the arithmetic.erl module I’ve been using in this blog post I’m going to use a variation of the module that starts a server process solves basic arithmetic. While simple, the module is still fairly lengthy so I won’t show it here, but you can view the module here.

Since we are now tracing an Erlang server, let’s record message send and message receive events as well as function calls in the module. Let’s also process trace messages in a custom tracer process so we can control what we do with the trace data. Here is the code for a primitive tracer process that prints human readable trace messages:

-module(my_tracer).

-export([start/0]).

start() ->
  spawn(fun loop/0).

loop() ->
  receive
      % Print a human readable log line for all trace messages received
      {trace, Pid, send, Msg, To} ->
          % Handle message send message
          io:format("Process ~p sent message ~p to process ~p~n", [Pid, Msg, To]);
      {trace, Pid, 'receive', Msg} ->
          % Handle message receive message
          io:format("Process ~p received message ~p~n", [Pid, Msg]);
      {trace, Pid, return_from, {Module, Function, Arity}, ReturnValue} ->
          % Handle function call return message
          io:format("Function ~p:~p/~p returned ~p in process ~p~n", [Module, Function, Arity, ReturnValue, Pid]);
      {trace, Pid, call, {Module, Function, Arguments}} ->
          % Handle a call trace message
          io:format("Function ~p:~p invoked with arguments ~p in process ~p~n", [Module, Function, Arguments, Pid]);
    Msg ->
      io:format("Received unexpected message ~p~n", [Msg])
  end.

You can see the tracer is expecting messages for sends and receives, but there is also a return_from trace message that it handles. In the previous example we only traced invocation of functions, so we were able to see the arguments they were invoked with, but we couldn’t see the return value of the function call. We tell the tracer to trace function returns as well, but it’s only possible to do so by invoking erlang:trace_pattern/3 with a match spec. I’m not going to go into details of match specs in this blog post, but they are basically patterns that can be used to match function calls the tracer should trace. The most vague match spec that instructs the tracer to capture function returns is [{'_', [], [{return_trace}]}]. We’ll use this along with the tracer above to capture return values.

Now let’s use this tracer on a call to the arithmetic server:

% Start my tracer process
1> Pid = my_tracer:start().

% Start the arithmetic server (I don't want to trace this part)
2> ServerPid = arithmetic_server:start().

% Start tracing calls, and message sends and receives
3> erlang:trace(ServerPid, true, [call, send, 'receive', {tracer, Pid}]).

% Trace all function calls in the arithmetic server module
4> erlang:trace_pattern({arithmetic_server, '_', '_'}, [{'_', [], [{return_trace}]}], [local]).

% Exercise the arithmetic server
5> arithmetic_server:multiply(ServerPid, 4, 3).
Process <0.89.0> received message {<0.85.0>,{multiply,4,3}}
Function arithmetic_server:multiply invoked with arguments [4,3] in process <0.89.0>
12
Function arithmetic_server:'-multiply/2-fun-0-' invoked with arguments [1,0,4] in process <0.89.0>
Function arithmetic_server:'-multiply/2-fun-0-'/3 returned 4 in process <0.89.0>
Function arithmetic_server:'-multiply/2-fun-0-' invoked with arguments [2,4,4] in process <0.89.0>
Function arithmetic_server:'-multiply/2-fun-0-'/3 returned 8 in process <0.89.0>
Function arithmetic_server:'-multiply/2-fun-0-' invoked with arguments [3,8,4] in process <0.89.0>
Function arithmetic_server:'-multiply/2-fun-0-'/3 returned 12 in process <0.89.0>
Function arithmetic_server:multiply/2 returned 12 in process <0.89.0>
Process <0.89.0> sent message 12 to process <0.85.0>
Function arithmetic_server:loop invoked with arguments [] in process <0.89.0>

% Turn tracing off
6> erlang:trace(all, false, [call, send, 'receive']).

Instead of sending traces as messages to the process that invoked erlang:trace/3, the tracer sent the message to the tracer process I specified, which then immediately printed them as they were received. You can see that return values are printed when a function returns. This allows us to know exactly what the arguments and the return value were for each function call. You can see in the example above the anonymous function inside the lists:foldl/3 function is traced, allowing us to see the input arguments and the return value for each loop.

erlang:trace/3 Disadvantages

erlang:trace/3 and erlang:trace_pattern/3 are powerful tracing functions that make it possible to trace function calls, garbage collections, message passing, and other things in the Erlang VM. There isn’t much that cannot be done with the API these functions expose. However, there are some pretty obvious drawbacks: Downsides to the erlang:trace/3 API:

  • The interface is complicated with a lot of options. As you’ve seen in these examples, simply tracing function calls requires at least two function calls, with three arguments to each. It’s a whole lot of ceremony for such a simple trace.

  • The interface is unintuitive. In addition to being complicated, certain features are extremely unintuitive. There are two clear examples of this in the traces I showed: First is turning off tracing. You need to invoke erlang:trace/3 with some of the same values for the third argument as when the trace was started, otherwise tracing won’t be turned off. The second is that return traces can only be enabled via the match specs passed to erlang:trace_pattern/3 and not via a trace flag to erlang:trace/3.

  • Not production safe. It’s easy to crash an Erlang node by simply tracing too many things. It’s also possible to create deadlock by tracing a function that is used by the tracer process. I’ve made simple mistakes and crashed nodes in development using these functions.

dbg

dbg is the only other tracing module that is included with Erlang that I’m going to cover in this article. dbg is a text-based tracer that typically just outputs trace data to the shell. It is part of the runtime_tools application so if you want to use it on your applications you’ll need to make sure they include runtime_tools.

dbg is very flexible just like erlang:trace/3, but the interface is somewhat different. Rather than being just two functions that support a wide variety of arguments, dbg provides fairly large set of functions that each only take a few simple arguments.

Example Trace with dbg

Let’s perform the first trace we did with erlang:trace/3, but this time with dbg.

% Start the tracer
1> dbg:tracer().
{ok,<0.87.0>}

% Turn on call tracing for all processes
2> dbg:p(all, [c]).
{ok,[{matched,nonode@nohost,42}]}

% Trace all function calls in the arithmetic module
3> dbg:tpl({arithmetic, '_', '_'}, []).
{ok,[{matched,nonode@nohost,6}]}

% Exercise the code
4> arithmetic:multiply(4, 3).
(<0.85.0>) call arithmetic:multiply(4,3)
(<0.85.0>) call arithmetic:'-multiply/2-fun-0-'(1,0,4)
(<0.85.0>) call arithmetic:'-multiply/2-fun-0-'(2,4,4)
(<0.85.0>) call arithmetic:'-multiply/2-fun-0-'(3,8,4)
12

% Turn tracing off
5> dbg:stop_clear().
ok

As you can see it’s similar to using erlang:trace/3, but there is another function call. The only big difference is that we first invoke dbg:tracer/0 to start dbg’s tracer process.

The dbg:p(all, [c]) function call is equivalent to erlang:trace(all, true, [call]) and ultimately causes dbg to invoke that same function.

The dbg:tpl({arithmetic, '', ''}, []) function call is equivalent to erlang:trace_pattern({arithmetic, '', ''}, true, [local]). and tells the tracer to trace all global and local function calls to any function in the arithmetic module.

One thing to note here is that dbg prints traces as human readable log lines. Unlike using erlang:trace/3 trace events are not sent as messages, but are formatted and printed as text. For debugging on the shell, this is often more convenient.

Another Trace with dbg

Let’s try tracing messages, function calls and function returns like we did with erlang:trace/3. Just like with the previous example with dbg, the dbg function calls correspond with the erlang:trace/3 and erlang:trace_pattern/3 function calls. The only difference is the API is a little simpler.

1> dbg:tracer().
{ok,<0.87.0>}

% Start the arithmetic server (I don't want to trace this part)
2> ServerPid = arithmetic_server:start().
<0.91.0>

% Start tracing calls, and message sends and receives
3> dbg:p(ServerPid, [m, c]).
{ok,[{matched,nonode@nohost,1}]}

% Trace all function calls and function returns in the arithmetic_server module
4> dbg:tpl({arithmetic_server, '_', '_'}, [{'_',[],[{return_trace}]}]).
{ok,[{matched,nonode@nohost,12}]}

% Exercise the arithmetic server
5> arithmetic_server:multiply(ServerPid, 4, 3).
(<0.90.0>) << {<0.85.0>,{multiply,4,3}}
(<0.90.0>) call arithmetic_server:multiply(4,3)
12
(<0.90.0>) call arithmetic_server:'-multiply/2-fun-0-'(1,0,4)
(<0.90.0>) returned from arithmetic_server:'-multiply/2-fun-0-'/3 -> 4
6> (<0.90.0>) call arithmetic_server:'-multiply/2-fun-0-'(2,4,4)
(<0.90.0>) returned from arithmetic_server:'-multiply/2-fun-0-'/3 -> 8
(<0.90.0>) call arithmetic_server:'-multiply/2-fun-0-'(3,8,4)
(<0.90.0>) returned from arithmetic_server:'-multiply/2-fun-0-'/3 -> 12
(<0.90.0>) returned from arithmetic_server:multiply/2 -> 12
(<0.90.0>) <0.85.0> ! 12
(<0.90.0>) call arithmetic_server:loop()

% Turn tracing off
5> dbg:stop_clear().

As you can see it is largely the same. We still have to pass in a match spec with {return_trace} in the match body to tell dbg we want to trace function returns, just like with erlang:trace_pattern/3.

dbg Disadvantages

dbg is a powerful tracing library that can be used to trace anything and provides an interface that is easier to use than erlang:trace/3 and erlang:trace_pattern/3 but it is still fairly esoteric. The short function names can be confusing and hard to remember. It shares two of the same drawbacks that erlang:trace/3 has:

  • The interface is unintuitive. The function names can be hard to remember and turning on function return tracing still requires that a match spec be passed in with a function body of {return_trace}.

  • Not production safe. Just like with erlang:trace/3, it is easy to crash an Erlang node by tracing too many things.

Other tracing modules included with Erlang

There several other modules that ship with Erlang that offer various tracing related features that I’m not covering here; they may be the topic of future blog posts. Here is a list of them:

  • et - Event Tracer. Used for recording arbitrary events. Can generate a sequence from the recorded events.

  • ttb - Trace Tool Builder. Helps build traces for distributed systems. Relies on dbg and provides some nice shortcuts for starting traces and running them on distributed systems.

  • dyntrace - Contributed module that provides an interface for dynamic tracing using external tools like dtrace.

  • seq_trace - Module for tracing information flowing between processes via message passing.

  • erl_tracer - Erlang tracer behavior that you can implement in your own modules.

Tracing Libraries

There are a lot of third-party open-source debugging and tracing libraries for Erlang. Here are a few a few of the most notable:

  • recon_trace - A module that handles tracing of function calls in a production safe manner for a single Erlang node.

  • redbug - A tool for interacting with Erlang trace functions. Has some safety features built in, such as only using a safe subset of features, terminating the trace if it feels overloaded, and automatically exiting after a certain number of traces or certain period of time.

  • erlyberly - A debugger for Erlang, Elixir, and LFE with a GUI that uses Erlang tracing BIFs.

Of these three, the only one I think worth covering in this blog post is the Recon library’s recon_trace module.

Recon

Recon is an open-source library developed by Fred Hebert that contains various modules for diagnosing production issues with Erlang systems. One of the modules it contains is recon_trace. The documentation sums it up nicely:

recon_trace is a module that handles tracing in a safe manner for single Erlang nodes, currently for function calls only. Functionality includes:

  • Nicer to use interface (arguably) than dbg or trace BIFs.

  • Protection against dumb decisions (matching all calls on a node being traced, for example)

  • Adding safe guards in terms of absolute trace count or rate-limitting

  • Nicer formatting than default traces

The interface for recon_trace is much easier to use than the interfaces of erlang:trace/3 and dbg. The trace output is easier to read than dbg’s default trace output. recon_trace is suitable for tracing nodes in production, with the limitation that it can only run a trace on a single node and only traces function calls. recon_trace also automatically stops tracing after a specified number of calls, and any trace you start must specify the maximum number of calls to trace. This ensures you don’t overload an Erlang node by tracing a function that is invoked often.

Because recon is a library that doesn’t ship with Erlang you’ll need to add it as a dependency to the applications you want to use it in. If you may want to use recon to debug issues in production you will want to add it as a dependency before compiling the production release of your application. Otherwise you’ll encounter an issue in production and won’t have the recon library available to help debug the issue.

If you are using rebar3, adding it is as easy as specifying it in your rebar.config file:

{deps, [recon]}.

And then run rebar3 get-deps to download it.

Tracing with Recon

Because recon_trace only supports tracing function calls, we can’t trace message sends and receives from the arithmetic_server like we were able to with erlang:trace/3 and dbg. But we can still trace functions calls and with recon_trace it’s much simpler. All we do to start tracing is invoke recon_trace:calls/2,3 with a MFA tuple (with either function arity or a match spec as the third value, a maximum number of traces, and optionally a list of options:

% Turn on call and return tracing for the next 20 calls for all processes. For both global and local function calls.
1> recon_trace:calls({arithmetic, '_', fun(_) -> return_trace() end}, 20, [{scope, local}]).
6

% Exercise the code
2> arithmetic:multiply(4, 3).
20:32:49.814072 <0.146.0> arithmetic:multiply(4, 3)
12
20:32:49.825579 <0.146.0> arithmetic:'-multiply/2-fun-0-'(1, 0, 4)
20:32:49.825802 <0.146.0> arithmetic:'-multiply/2-fun-0-'/3 --> 4
20:32:49.826031 <0.146.0> arithmetic:'-multiply/2-fun-0-'(2, 4, 4)
20:32:49.826208 <0.146.0> arithmetic:'-multiply/2-fun-0-'/3 --> 8
20:32:49.826342 <0.146.0> arithmetic:'-multiply/2-fun-0-'(3, 8, 4)
20:32:49.826465 <0.146.0> arithmetic:'-multiply/2-fun-0-'/3 --> 12
20:32:49.826658 <0.146.0> arithmetic:multiply/2 --> 12

% Stop tracing, we don't necessarily have to do this
3> recon_trace:clear().

Conclusion

Erlang itself provides low level functions and numerous tools for debugging application code. We saw that traditional breakpoint debugging is in fact possible in Erlang, but is limited in what it can do due to Erlang’s concurrent nature. We also saw how erlang:trace/3 and erlang:trace_pattern/2,3 allow us to trace almost any event in the Erlang virtual machine, and how to select processes for tracing. We also looked at the dbg library, which provides a different interface for running traces. Finally we looked at recon and the recon_trace module which provides production safe tracing functionality and a more streamlined tracing interface.

All of the tracing examples showed just a few of the things that are possible with the Erlang VM’s low level tracing functions. The VM’s tracing features are incredibly powerful and provide everything we need to inspect a running system and figure out what is going wrong.

If you found this blog post useful you may want to check out the free book, Erlang in Anger, by Fred Hebert, the creator of recon. It covers a lot of production debugging techniques and is geared towards those who may not have experience with the Erlang VM. Also consider subscribing to my blog as I am planning on writing more articles on Erlang and Elixir debugging and tracing in the future.

References