elixir

Tower: Universal and Agnostic Elixir Exception Tracker

Gonzalo Rodriguez

7-9 minutes - November 07, 2024

TL;DR: We built and open sourced a new Elixir package called Tower that provides Elixir Exception Tracking with many pre-built reporting options available, including E-mail, Slack, local ErrorTracker and popular 3rd party services.

Motivation

The Elixir programming language is well known for it's easy way of building applications that, almost naturally, can deal with and tolerate faults.

This is mainly a consequence of how BEAM VM programs (and in turn Erlang and Elixir programs), by design, run as a collection of small and isolated processes supervised by other special processes called supervisors that act when their supervised processes (children) exit abnormally (mostly restarting them based on different strategies).

However, that doesn't mean that one would want to completely ignore any exception in processes running as part of an Elixir application.

Something goes wrong for one or some amount of processes, they terminate their execution abnormally, and a supervisor takes care of re-starting these with clean state. It is truly great that this mechanism makes your application tolerant to the exception, the fact that it contained, isolated, and that it doesn't bring your whole executing application down.

But you still want to be informed about whatever went wrong, so that you can try at a later point in time, understand, troubleshoot and fix whatever needs fixing, to prevent it from happening again if possible or prevent worse consequences.

So, exception tracking and reporting, I think is still very important for Elixir applications as it is for other programming languages.

Listening for exceptions

When it comes to writing code to listen and be aware of all exceptions occurring in a running Elixir application, it's not as trivial as it can be with other programming languages that are naturally single-process. Writing a simple "try-rescue" wrapper in an application "entry point" won't work, because there is no such thing as a main entry or main function.

Any Elixir application is highly concurrent. Several processes will be started concurrently at boot time by the BEAM VM. Not only the main application supervisor's children defined in the application.ex file will run concurrently, but potentially also a several of the dependencies listed in mix.exs project deps, may have defined their own "application" that Elixir will also run concurrently along yours.

Then how?

For web applications using plug, implementing the Plug.ErrorHandler behavior can be enough to handle all exceptions happening in processes executing plug calls, i.e. executing the source code that is responding to web requests, like a Phoenix controller action function. However, it won't be aware of exceptions happening elsewhere, in backgrounds processes like Oban jobs, or any concurrent processes part of your supervision tree or dependencies supervision trees, or even unlinked tasks (with Task.start/1) started in a controller function itself.

Whether these omissions are a deal-breaker might depend on each one.

They are for me in most use cases if not all. I won't feel comfortable running an Elixir application in production for which I won't be informed of all exceptions, except the ones I intentionally silenced. I think I'd rather be informed by an error tracker first than by a customer.

A second and more general-purpose approach is implementing a logger handler.

For quite some time, in Elixir, there wasn't a well generalized and solid solution for this, until a substantial change landed in Elixir 1.15 with the full integration of Elixir's Logger to Erlang's brand-new :logger (replacing deprecated :error_logger), introduced in Erlang 21.0.

Details probably deserve their own separate article but in summary it means that one can easily define what is called a "logger handler" and listen to all, or a filtered subset of, logged events in the running applications, which include, among other events, exceptions in processes.

1# lib/my_app/my_exception_listener.ex
2
3defmodule MyExceptionListener do
4  def log(%{meta: %{crash_reason: crash_reason}} = _log_event, _config) do
5    # Do something with log events special metadata value `crash_reason`:
6    # https://hexdocs.pm/logger/main/Logger.html#module-metadata
7  end
8end
9
10# lib/my_app/application.ex
11
12def start(_args) do
13  :logger.add_handler(MyExceptionListener, MyExceptionListener, %{level: :error})
14
15  # rest of your application start
16end

This will listen to any exception "crashing" any running Elixir process.

Raised exceptions that are manually handled (via rescue or catch) by application code or your dependencies code, that aren't re-raised won't actually "crash", so it won't land on the logger, unless the handling code manually sends a log message to the logger including the crash_reason in the log message metadata (e.g. here is how Broadway does it).

An example of these "silent" exceptions are those occurring inside Oban job execution. For those particular situations, hopefully the library emits :telemetry events with the exception information, which is the case for Oban. That means that in addition to a logger handler you may need a few additional Telemetry event handlers as add-ons for special cases.

Cool. We found a way to listen to any exception occurring in any concurrent process in the whole running application.

Tower

All of the above work to listen to exceptions is, in a nutshell, what Tower does, a package we build for ourselves at Mimiquate and open sourced.

At its core it is a universal Elixir exception tracker, that wraps the exception data into a well-defined Elixir struct called a Tower.Event and passes these along to any number of reporters. So we also say is agnostic to reporting destination, meaning it's not coupled with Exception Monitoring Services, E-mail or Slack or any other.

Reporting exceptions

In order to report to one any number of destinations you can depend on separate Elixir packages that we also wrote and published, or write your own.

Example Scenarios

Scenario A

You want to report to Sentry.

Include tower_sentry as a dependency of your Elixir project.

Set a few Sentry-specific config settings.

Set config :tower, reporters: [TowerSentry]

Automatic reporting of exceptions will “just work”.

Can manually report by calling Tower.report_exception anywhere you like in your application code, like so:

1try do
2  # possibly raising code
3rescue
4  exception ->
5    Tower.report_exception(exception, __STACKTRACE__)
6end

Scenario B

You need or want to switch service from Sentry to Honeybadger.

Replace tower_sentry with tower_honeybadger in your Elixir project dependencies.

Replace config :tower, reporters: [TowerSentry] with config :tower, reporters: [TowerHoneybadger]

Remove Sentry-specific configs.

Add Honeybadger-specific config, like setting API Key.

Automatic reporting of exceptions continues to “just work” without any changes.

Manual report calls to Tower.report_exception throughout your application code unchanged.

Scenario C

You are reporting to Sentry and you want to test ErrorTracker while continuing to report to Sentry.

Add tower_error_tracker.

Follow a few ErrorTracker specific configs.

Update config :tower, reporters: [TowerSentry] to config :tower, reporters: [TowerSentry, TowerErrorTracker].

Automatic reporting of exceptions continue to “just work” without any changes.

Manual report calls to Tower.report_exception throughout your application code unchanged.

Now you will have exceptions reported to both Sentry and ErrorTracker.

Useful links

Source code in GitHub: mimiquate/tower

Package in hex.pm: tower