Case Study: Reusing Double Dispatch for serialization

In my previous blog post, I gave a tutorial on the double dispatch pattern. I mentioned that you could reuse the pattern for a variety of things, one of those being I/O. In this blog post, we’ll walk through how we can add serialization support for our class hierarchy without touching the classes themselves.

Situation: You want to serialize a collection of base class pointers

And the double dispatch machinery is already in place.

The main classes we care about are an Animal base class, and AnimalCollection that is a lightweight wrapper around a vector<Animal*>:

struct Animal
{
  virtual ~Animal() = default;
  virtual void Visit(AnimalVisitor*) const = 0;
};

struct AnimalCollection
{
  // ...

  private:
  std::vector<std::unique_ptr<Animal>> animals_;
};

A brief review of our double dispatch machinery

AnimalVisitor is the roughly the same as before — an abstract base class with knowledge of the Animal hierarchy:

struct Cat;
struct Dog;
// ... other forward declared classes

struct AnimalVisitor
{
  virtual void Visit(const Cat*) = 0;
  virtual void Visit(const Dog*) = 0;
  // ... other overloads for Visit
  virtual ~AnimalVisitor() = default;
};

And the derived Animal classes all use the mixin pattern I showed in the last post to make them visitable:

template<class T>
struct VisitableAnimal : Animal
{
  void Visit(AnimalVisitor* _visitor) const override{
    _visitor->Visit(static_cast<const T*>(this));
  }
};

struct Cat : VisitableAnimal<Cat>
{};

struct Dog : VisitableAnimal<Dog>
{};

//... other Animals

(I’ve slightly modified the Visit methods and AnimalVisitor here to operate on pointers to const. In reality the code will have both versions, as you’ll see in the live demo later.)

Serialization requirements

Our serialization goals are modest:

  • Write an AnimalCollection to a std::ostream
  • Read an AnimalCollection from a std::istream
  • Each Animal is written to its own line as a magic string
    • e.g. "cat" for Cat and "dog" for Dog, etc.

Example output:

Given an AnimalCollection like [Cat, Dog, Cat, Cat, Llama], we’d expect output like so:

cat
dog
cat
cat
llama

Why this format specifically?

I intentionally simplified the serialization/deserialization logic here to only operate on magic strings representing a type, because this dovetails nicely into factory-pattern serialization schemes.

Key-value serialization formats like .xml and .json lend themselves nicely to this kind of factory pattern. Here’s a sample .json file we might eventually end up with for our animal collection as the Animal classes evolve:

# animals.json
{ "animal_collection":
  {
   "count" : 2
   "data" : [
       {
          "type" : "cat"
          "declawed" : "true"
          "mice_killed" : 0
       },
       {
           "type" : "dog"
           "belly_rubs_received" : 42
           "sticks_gathered" : 8
       }
    ]
}

Note that each field in the "data" section of our JSON has a "type" string representing which concrete Animal class to create. A more fully-featured serialization scheme would read that first and then delegate the rest of the work to a type-specific Cat::Load(std::istream&) or Dog::Load(std::istream&) etc.

The naive approach

Your knee-jerk reaction to the above requirements is likely along the lines of "let’s give the Animal class a virtual Save() method!"

struct Animal
{
  virtual void Save(std::ostream& _outstream) const = 0;
  // ...
};

And then derived classes can implement Save like so

struct Cat : VisitableAnimal<Cat>
{
  void Save(std::ostream& _outstream) const override{
    _outstream << "cat\n";
  }
};

This makes things quite easy for AnimalCollection

struct AnimalCollection
{
  // ...
  public:
  void Save(std::ostream& _outstream) const {
    for(const auto& animal : animals_)
    {
      animal->Save(_outstream);
    }
  }
};

Problem solved, right?

Well, we sort of forgot about deserialization…

Deserialization

How do we implement AnimalCollection::Load now?

void AnimalCollection::Load(std::istream& _instream){
  std::string nextline;
  while(std::getline(_instream, nextline))
  {
    auto nextAnimal = /*which Animal to create???*/;
    animals_.push_back(std::move(nextAnimal));
  }
}

The problem is that the magic strings representing Cat, Dog, etc. are all hard-coded into *::Save methods. From here, you might be tempted to take one of the following approaches:

  1. AnimalCollection should "just know" about all the magic strings for each derived type
if (nextLine == "cat")
{
  auto nextAnimal = std::make_unique<Cat>();
  animals_.push_back(std::move(nextAnimal));
}else if (nextLine == "dog")
{
   // ...
}
// etc

Code like that should be a big red flag. What happens when we want to change the strings such that the first character is capitalized? "Cat" instead of "cat". Now we have to do it in two places — once in AnimalCollection.cpp and again in Cat.cpp.

Data that is duplicated should be assumed to already be out of sync.

  1. Perhaps each type should have a GetType() method that returns a string (or an enum that can be converted into a string)
struct Cat : VisitableAnimal<Cat>
{
  static std::string GetType(){
     return "cat";
  }
};

Then our deserialization code looks like this:

if (nextLine == Cat::GetType())
{
  auto nextAnimal = std::make_unique<Cat>();
  animals_.push_back(std::move(nextAnimal));
} else if (nextLine == Dog::GetType())
{
  // ...
}
// etc.

Code like this might be useful if you are exposing an AnimalFactory directly to the client (via a Factory Pattern). Our use case, though, begs the question of

"Do you really need to add another method to every class’ interface?"

I think the answer is a resounding "No", and I’m going to appeal to the authority of Bob Martin to support me here

Good developers learn to limit what they expose at the interfaces of their classes and modules. The fewer methods a class has, the better. – Robert Martin, Clean Code

What a great motivation for reusing our double dispatch machinery to solve this problem non-invasively.

Adding Serialization via double dispatch

We can reuse our existing double dispatch machinery that we’ve already gone to the trouble of adding to Animal to save and load an AnimalCollection.

Our goals are thus:

  • Only write our magic strings in one location
  • Avoid using run time type information
  • Don’t touch any of our existing interfaces

Saving via double dispatch

We need to save a single Animal to a stream, and we don’t need state, so let’s prefer a non-member, non-friend function to do this in order to maximize encapsulation.

// AnimalSerialization.h
namespace animal_serialization
{
  // preconditions: Animal is not null, _outstream is open and ready
  void Save(const Animal* _animal, std::ostream& _outstream);
}

In AnimalSerialization.cpp, we need a way to translate Animal instances into strings. Our double dispatch visitor comes in handy here.

The implementation is fairly straightforward. Deriving a SaveAnimalVisitor from AnimalVisitor allows us to immediately know about all types in the hierarchy. From there it’s a matter of printing the magic strings to a stream.

// AnimalSerialization.cpp
// ...
namespace 
{
std::string CatString(){return "cat";}
std::string DogString(){return "dog";}

struct SaveAnimalVisitor : AnimalVisitor
{
  public:
  // precondition: _outstream will outlive the SaveAnimalVisitor instance
  explicit SaveAnimalVisitor(std::ostream& _outstream) :
  outstream_{&_outstream}
  {}

  void Visit(const Cat*) override{
    *outstream_ << CatString() << "\n";
  }
  void Visit(const Dog*) override{
    *outstream_ << DogString() << "\n";
  }
  // ... other overridden Visit methods

  private:
  std::ostream* outstream_ = nullptr;
};
} // anonymous namespace

From there it’s a simple matter of hooking up this new visitor to the Animal we wish to serialize:

// AnimalSerialization.cpp
// ... (our visitor code)
namespace animal_serialization
{
  void Save(const Animal* _animal, std::ostream& _outstream){
    ::SaveAnimalVisitor saveVisitor{_outstream};
    _animal->Visit(&saveVisitor);
  }
}

AnimalCollection::Save uses it like so:

// AnimalCollection.cpp
void AnimalCollection::Save(std::ostream& _outstream) const{
  for(const auto& animal : animals_)
  {
    animal_serialization::Save(animal.get(), _outstream);
  }
}

That was disturbingly easy, right?

We could grow our Animal hierarchy to 100 types and AnimalSerialization.cpp would still only be ~400 LOC.

Loading

Now the challenge is to read strings from a stream and construct Animal instances. How can double dispatch help us here?

While we cannot directly use double dispatch during the deserialization portion, what it enabled us to do was put all our magic strings into just the AnimalSerialization.cpp source file. With that in place, we can implement animal_serialization::Load() as a basic Factory:

// AnimalSerialization.h
namespace animal_serialization
{
  // ...
  // precondition: _instream is open and ready
  std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream);
}

// AnimalSerialization.cpp
// ...
std::unique_ptr<Animal> ParseAnimal(std::string _line){
  if (_line == CatString())
    return std::make_unique<Cat>();
  else if (_line == DogString())
    return std::make_unique<Dog>();
  // etc.
}

namespace animal_serialization
{
  // ...
  std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
    std::vector<std::unique_ptr<Animal>> toReturn;
    std::string line;
    while(std::getline(line, _instream))
      toReturn.push_back(ParseAnimal(line));
    return toReturn;
}

Wow that was easy, too.

AnimalCollection::Load has an easy task in front of it:

// AnimalCollection.cpp
// ...
void AnimalCollection::Load(std::istream& _instream)
{
    std::vector<std::unique_ptr<Animal>> loadedAnimals = animal_serialization::Load(_instream);
    animals_ = std::move(loadedAnimals);
}

From this point, we could take refactoring a number of steps further, ultimately going so far as to have a map of strings to functions returning Animal instances:

// AnimalSerialization.cpp
// ...
template<class T>
std::unique_ptr<Animal> CreateAnimal()
{
  return std::make_unique<T>();
}

using AnimalCreatorFunction = std::function<std::unique_ptr<Animal>()>;
std::unordered_map<std::string, AnimalCreatorFunction> animalFactory =
{
  {CatString(), AnimalCreatorFunction{&CreateAnimal<Cat>}},
  {DogString(), AnimalCreatorFunction{&CreateAnimal<Dog>}},
  // ...
};

std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
    std::vector<std::unique_ptr<Animal>> toReturn;
    std::string line;
    while(std::getline(_instream, line))
        toReturn.push_back(animalFactory.at(line)());
    return toReturn;
}

Full-fledged demo here

At this point, we’ve moved the implementation details into the narrowest possible scope and avoided duplication at all cost. This is a pretty good stopping point; we could still grow to 100 derived Animal types without over-complicating or overcrowding AnimalSerialization.cpp (Perhaps ~500 LOC).

(If you find yourself in a situation where you DO need to split things up further, feel free to contact me (see my About Me page); there are other techniques we could use that are outside the scope of this article.)

Conclusion

The double dispatch pattern lends itself nicely to stable interfaces thanks to its reusability. In this post, I walked through how we might reuse it to implement basic serialization without needing to touch Animal itself, or any derived class. I hope you’re already thinking of places in your codebase that could benefit from refactoring to use this pattern!

3 thoughts on “Case Study: Reusing Double Dispatch for serialization

Leave a comment