elixir

Tower: Universal and Agnostic Elixir Exception Tracker

Gonzalo Rodriguez avatar

Gonzalo Rodriguez

7-9 minutes - November 07, 2024

TL;DR: We built and open sourced a new Elixir package called Tower that provides Elixir Exception Tracking with many pre-built reporting options available, including E-mail, Slack, local ErrorTracker and popular 3rd party services.

Motivation

Elixir is well known for it's easy way of building applications that, almost naturally, can tolerate faults fairly well.

This is mainly a consequence of how Erlang (and though Elixir) programs are, by design, run as a collection of small and isolated processes supervised by special processes called supervisors that act (mostly restart them based on different strategies) when their "child" processes exit abnormally.

However, doesn't mean that one should ignore any error in any process of a running Elixir application.

Something goes wrong for one or some amount of processes, they terminate their execution abnormally, and a supervisor takes care of re-starting these with clean state. It is truly great that this mechanism makes you're application tolerant to the error, the fact that it contained, isolated, and that it didn't brought your whole application down.

But you still want to be informed about whatever gone wrong, so that you can understand, troubleshoot and fix whatever needs fixing, to prevent it from happening again if possible or prevent worse consequences.

So, exception tracking and reporting, is still very important for Elixir applications as it is for other programming languages.

Listening to exceptions

When it comes to writing code to listen and be aware of all exceptions occurring in a running Elixir application, it's not as trivial as it can be with other programming languages that are naturally single-threaded. Writing a simple "try-rescue" wrapper in an application "entry point" won't work, because there is no such thing as a main entry or main function.

Any Elixir application is highly concurrent. Many processes will be started concurrently during boot time. Not only the main application supervisor's children defined in the application.ex file will run concurrently, but potentially also a lot of the dependencies listed in mix.exs project deps, may have defined their own "application" that Elixir will also run concurrently along yours.

Then how?

For web applications using plug, implementing the Plug.ErrorHandler behaviour can be useful to be handle all the exceptions happening in processes executing plug calls, i.e. executing the code responding to web requests, like a Phoenix controller action function. However, it won't be informed about exceptions happening elsewhere, in backgrounds processes like Oban jobs, or any concurrent processes part of your supervision tree or dependencies supervision trees, or even in an unlinked task (with Task.start/1) started in the plug call itself.

Whether these omissions are a deal-breaker might depend on each one.

They are for me, I won't feel comfortable running an application in production for which I won't be informed of all exceptions, except the ones I intentionally silenced. I'd rather be informed by an error tracker first than by a customer.

A second and more general-purpose approach is implementing a logger handler.

For a long time in Elixir this wasn't a well generalized and solid solution for this, until a substantial change landed in Elixir 1.15 with the full integration of Elixir's Logger to Erlang's brand-new :logger (replacing deprecated :error_logger), introduced in Erlang/OTP 21.0.

Details probably deserve their own article but in summary it means that one can easily define what is called a "logger handler" and listen to all, or a filtered subset of, logged events in the running applications, which include exceptions in any abnormally terminating process.

1# lib/my_app/my_exception_listener.ex
2
3defmodule MyExceptionListener do
4  def log(%{meta: %{crash_reason: crash_reason}} = _log_event, _config) do
5    # Do something with log events special metadata value `crash_reason`:
6    # https://hexdocs.pm/logger/main/Logger.html#module-metadata
7  end
8end
9
10# lib/my_app/application.ex
11
12def start(_args) do
13  :logger.add_handler(MyExceptionListener, MyExceptionListener, %{level: :error})
14
15  # rest of your application start
16end

This will listen to any exception "crashing" any running Elixir process.

Raised exceptions that are manually handled (via rescue or catch) by application code or your dependencies code, that aren't re-raised won't actually "crash", so it won't land on the logger, unless the handling code manually sends a log message to the logger including the crash_reason in the log message metadata (e.g. here is how Broadway does it).

An example of these "silent" exceptions are those occurring inside Oban job execution. For those particular situations, hopefully the library emits :telemetry events with the exception information, which is the case for Oban. That means that in addition to a logger handler you may need a few additional Telemetry event handlers as add-ons for special cases.

Cool. We found a way to listen to any exception occurring in any concurrent process in the whole runtime or supervision trees of the different running applications.

Tower

All of the above and discussed so far, in a nutshell, is what Tower does, a package we build for ourselves at Mimiquate and open sourced.

At its core it is a universal exception tracker, that bundles the exception data into a well-defined Elixir struct called a Tower.Event and passes these along to any number of reporters. So we say is agnostic to reporting destination, meaning it's not coupled with Exception Monitoring Services, E-mail or Slack or any other.

Reporting exceptions

To report to one any number of destinations you can depend on separate Elixir packages that we built, or write your own.

Example Scenarios

Scenario A

You want to report to Sentry.

  • Include tower_sentry as a dependency of your Elixir project.
  • Set a few Sentry-specific config settings.
  • Set config :tower, reporters: [TowerSentry]
  • Automatic reporting of exceptions will “just work”.

    Can manually report by calling Tower.report_exception anywhere you like in your application code, like so:

    1try do
    2  # possibly raising code
    3rescue
    4  exception ->
    5    Tower.report_exception(exception, __STACKTRACE__)
    6end

    Scenario B

    You need or want to switch service from Sentry to Honeybadger.

  • Replace tower_sentry with tower_honeybadger in your Elixir project dependencies.
  • Replace config :tower, reporters: [TowerSentry] with config :tower, reporters: [TowerHoneybadger]
  • Remove Sentry-specific configs.
  • Add Honeybadger-specific config, like setting API Key.
  • Automatic reporting of exceptions continues to “just work” without any changes.
  • Manual report calls to Tower.report_exception throughout your application code unchanged.
  • Scenario C

    You are reporting to Sentry and you want to test ErrorTracker while continuing to report to Sentry.

  • Add tower_error_tracker.
  • Follow a few ErrorTracker specific configs.
  • Update config :tower, reporters: [TowerSentry] to config :tower, reporters: [TowerSentry, TowerErrorTracker].
  • Automatic reporting of exceptions continue to “just work” without any changes.
  • Manual report calls to Tower.report_exception throughout your application code unchanged.
  • Now you will have exceptions reported to both Sentry and ErrorTracker.
  • Useful links

  • GitHub: mimiquate/tower
  • Hex.pm: tower
  • mimiquate petmimiquate pet