Posts in this series

Intro

This is it, the final post in this Event Sourcing with Elixir series! This final post will be light on code, I promise, and will be composed of final notes about using Commanded in production, and a broader look at why DDD and Event Sourcing matter.

Notes on running Commanded apps in production

Our library of choice for this series is great, but there are just a few things I have to mention if you mean to use it in production.

Aggregate lifespans

As you know by now, our Aggregates are Erlang processes, and Commanded keeps them around once created. This is problematic if you have a lot of them, as I’ve discovered with an app I deployed using it! 😄 Here’s a graph of that app’s memory usage:

Memory Consumption (aka the Sawtooth of Death)

The dips are releases in which the app was restarted. Luckily, there’s a way around this with lifespans. For every aggregate, we can control what happens to its process after handling an event or command. The documentation’s example:

defmodule BankAccountLifespan do
  @behaviour Commanded.Aggregates.AggregateLifespan

  def after_event(%MoneyDeposited{}), do: :timer.hours(1)
  def after_event(%BankAccountClosed{}), do: :stop
  def after_event(_event), do: :infinity

  def after_command(%CloseAccount{}), do: :stop
  def after_command(_command), do: :infinity

  def after_error(:invalid_initial_balance), do: :timer.minutes(5)
  def after_error(_error), do: :stop
end

Then it’s a simple matter of including it in the router file:

defmodule BankRouter do
  use Commanded.Commands.Router

  dispatch [OpenAccount,CloseAccount],
    to: BankAccount,
    lifespan: BankAccountLifespan, # ◀️◀️◀️
    identity: :account_number
end

We can see here that after handling a BankAccountClosed event, there’s really no reason to keep the process around in memory, so we :stop it. For others, you can start a timer or just keep them around forever if you so choose. Correctly tweaking your aggregate lifespans according to your business logic needs will help you control your app’s memory usage effectively.

But what happens when the process is stopped? Well, any time in the future that your code mentions a stopped aggregate, Commanded will restart it, replaying all events that it has ever received to recreate its state. This is great for aggregates with a few events, but once you start having long event streams you might notice a delay on issuing events to aggregates. Which brings us to our next point.

Aggregate snapshots

If the event streams for your aggregates are starting to go over the 50 or 100 events mark, depending on how expensive the events are to apply, you might want to consider snapshotting.

credit: Richer data-history-event-sourcing

In this example, a dump of the aggregate’s internal state is stored in its event stream after 23 events. From that point on, if Commanded needs to restart this aggregate when its process has been stopped, it will only load the snapshot and replay events 24 and 25, instead of all 25 events. It’s easy to enable this in Commanded. Start with the configuration:

# config/config.exs

config :commanded, ExampleAggregate
  snapshot_every: 10,
  snapshot_version: 1

The default is to serialize aggregate state as JSON. To enable this, be sure to derive the proper protocol in your aggregate modules:

defmodule ExampleAggregate do
  @derive Jason.Encoder # ◀️◀️◀️
  defstruct [:name, :datetime]
end

You can also roll you own serialization, check the docs for more info.

Snapshotting is a great concept but it can also be a source of problems, since snapshots have to be rebuilt if you ever change the shape of your aggregate’s internal state. Any field you add or remove has to increment the snapshot_version value in the configuration, so that previous snapshots are not loaded with the old state shape. Also, you might run into performance problems if your aggregate’s state is very large and you snapshot too frequently.

In light of the GDPR EU regulation, users are now entitled to dumps of the data you hold on them, and in addition have the right to “be forgotten” by your system. In practice, there are a few cases in which you can retain user data for long term storage - legal advice is highly suggested in this manner.

Event Sourcing’s immutability and storage of every past event seems to make GDPR compliance an impossibility, but there’s a clever way to get around it with encryption.

If we encrypt every sensitive bit of information (PII) we have on a user, we can in the future just delete the key and we won’t have to actively seek out and delete events (and end up with broken streams) as the data will still be present, but in practice no longer readable. Of course, your business logic will need to be able to respond to these unreadable event errors and either halt or just display something like “Deleted Account” for user names no longer readable, for example. I’ve seen this approach being called “crypto-trashing”. Here are some links to give you more info on it:

Takeaways from this series as a whole

Domain-Driven Design, as a whole, is a very interesting topic. As developers, we’ve all struggled with complexity in our projects - not only for greenfield endeavors with complex problem spaces, but also the kind that creeps over time in our famed monoliths.

Code as you mean it

Instead of diving head-first into what tables and controllers and views you’ll need for a project, try to consider what internal boundaries your project’s solution reveals as necessary for a harmonious separation of concerns and data within your system. Often you’ll realize that truly including the business in the project, not just with occasional meetings but as a living counterpart in terms of knowledge to the system’s concepts and relationships, will turn out to be the key element in preventing your assumptions to go out of sync with the reality of what the project actually aims to accomplish.

This business orientation reveals itself in the code in a number of ways. For example, consider codifying something like

ScheduleService.create(%Appointment{user: X}, %Date{}, is_weekend: true)

ScheduleService.addWeekendAppointment(%User{}, %Date{})

If you see your code as something that merely writes attributes on a database, you’ll most likely find it harder to test and also to reason about when you read it a year from now.

Our approach to error handling becomes a bit more enlightened through the DDD lens as well. What do we usually do when we have a network timeout in our projects? We retry or log it. But since we have our DDD hats on, we can see timeouts for what they really are: business and not technical concerns. As such, they become part of our flows and timeouts should generate events that have an effect on our system’s state. After all, our customers don’t really want to see 500 error pages, they want to know what happens to their requests in a meaningful way. Thus, this kind of event should be accounted for in our domain. This rule of thumb may not apply to all of your timeouts or other kinds of traditionally technical errors because, as always, it depends. But try to consider the whole of the process into your domain.

Because, as developers, we loathe inconsistency - but the fact is that most complex problem spaces have to deal with it. Inconsistency usually goes hand in hand with asynchronous processes, and therefore needs to be accounted for and strategies to manage it drawn up. Process managers with mitigation events that roll-back processes that are only partially successful are a great aid in this respect. Finite state machines that determine consistency can also help.

Layering

DDD also teaches us to layer our code appropriately. Monoliths usually have a mesh of responsibilities and indiscriminate data access, which makes refactoring them hard and, as a coping mechanism, more is piled on. Correct layer boundaries are difficult to discern, though, so don’t feel bad if at first you don’t succeed.

To source events or not

Event Sourcing or CQRS is not mandatory with Domain-Driven Design. But it’s a very powerful pattern that lends itself to DDD and CQRS quite well. Some even argue that you can’t have CQRS without it. Regardless, it is complex. Designing proper events is not trivial. Versioning them is hard, and a whole book has been written about it by the “father” of CQRS, Greg Young. I want to believe you get better at it with time, though!

Event-driven code is also a great enabler of monolith deconstruction. Emitting events on your old code and handling them on a new system running in parallel is a powerful strategy to tackle refactoring-resistant code.

More info

At the end of the day, we all want to become better at our jobs and deliver solutions that work better and adapt to changing requirements. I hope you consider that DDD can help you with that in complex problem spaces (and maybe not-so-complex ones too). There are a lot of quality books on the matter. I suggest Domain-Driven Design Distilled as a quick read, and then something more substantial like the Blue or Red books to really dig into it. There’s also DDD Quickly and Exploring CQRS and Event Sourcing, both of which are free downloads.

Unfortunately, most of the literature covers either C# or Java, which motivated me to write about it using Elixir, a language I believe is uniquely suited to it. If you’re looking for example Commanded codebases or info, there’s an open-source project by Commanded’s author online here and a blog post about it. There’s also an introductory talk by Bernardo Amorim. And finally there’s a work-in-progress book about creating a Medium clone with it, also by Commanded’s author.

Wrapping Up

Phew, this was a long series but I feel like I’ve covered a bunch of essentials. Let me know if you have any feedback on Twitter or shoot me an email. I might return to Commanded in the future, with more advanced topics. Until then, I have a pile of unfinished side projects involving XML APIs, BEAM bytecode and telnet servers, so time to write about something else 😉.

See you next time!

Cover image credit: PIXNIO

Event Sourcing With Elixir - Part 7