In my previous blog post, I gave a tutorial on the double dispatch pattern. I mentioned that you could reuse the pattern for a variety of things, one of those being I/O. In this blog post, we’ll walk through how we can add serialization support for our class hierarchy without touching the classes themselves.
Situation: You want to serialize a collection of base class pointers
And the double dispatch machinery is already in place.
The main classes we care about are an Animal
base class, and AnimalCollection
that is a lightweight wrapper around a vector<Animal*>
:
struct Animal
{
virtual ~Animal() = default;
virtual void Visit(AnimalVisitor*) const = 0;
};
struct AnimalCollection
{
// ...
private:
std::vector<std::unique_ptr<Animal>> animals_;
};
A brief review of our double dispatch machinery
AnimalVisitor
is the roughly the same as before — an abstract base class with knowledge of the Animal
hierarchy:
struct Cat;
struct Dog;
// ... other forward declared classes
struct AnimalVisitor
{
virtual void Visit(const Cat*) = 0;
virtual void Visit(const Dog*) = 0;
// ... other overloads for Visit
virtual ~AnimalVisitor() = default;
};
And the derived Animal
classes all use the mixin pattern I showed in the last post to make them visitable:
template<class T>
struct VisitableAnimal : Animal
{
void Visit(AnimalVisitor* _visitor) const override{
_visitor->Visit(static_cast<const T*>(this));
}
};
struct Cat : VisitableAnimal<Cat>
{};
struct Dog : VisitableAnimal<Dog>
{};
//... other Animals
(I’ve slightly modified the Visit
methods and AnimalVisitor
here to operate on pointers to const
. In reality the code will have both versions, as you’ll see in the live demo later.)
Serialization requirements
Our serialization goals are modest:
- Write an
AnimalCollection
to astd::ostream
- Read an
AnimalCollection
from astd::istream
- Each
Animal
is written to its own line as a magic string- e.g. "cat" for
Cat
and "dog" forDog
, etc.
- e.g. "cat" for
Example output:
Given an AnimalCollection
like [Cat, Dog, Cat, Cat, Llama]
, we’d expect output like so:
cat
dog
cat
cat
llama
Why this format specifically?
I intentionally simplified the serialization/deserialization logic here to only operate on magic strings representing a type, because this dovetails nicely into factory-pattern serialization schemes.
Key-value serialization formats like .xml and .json lend themselves nicely to this kind of factory pattern. Here’s a sample .json file we might eventually end up with for our animal collection as the Animal
classes evolve:
# animals.json
{ "animal_collection":
{
"count" : 2
"data" : [
{
"type" : "cat"
"declawed" : "true"
"mice_killed" : 0
},
{
"type" : "dog"
"belly_rubs_received" : 42
"sticks_gathered" : 8
}
]
}
Note that each field in the "data" section of our JSON has a "type" string representing which concrete Animal
class to create. A more fully-featured serialization scheme would read that first and then delegate the rest of the work to a type-specific Cat::Load(std::istream&)
or Dog::Load(std::istream&)
etc.
The naive approach
Your knee-jerk reaction to the above requirements is likely along the lines of "let’s give the Animal
class a virtual Save()
method!"
struct Animal
{
virtual void Save(std::ostream& _outstream) const = 0;
// ...
};
And then derived classes can implement Save
like so
struct Cat : VisitableAnimal<Cat>
{
void Save(std::ostream& _outstream) const override{
_outstream << "cat\n";
}
};
This makes things quite easy for AnimalCollection
struct AnimalCollection
{
// ...
public:
void Save(std::ostream& _outstream) const {
for(const auto& animal : animals_)
{
animal->Save(_outstream);
}
}
};
Problem solved, right?
Well, we sort of forgot about deserialization…
Deserialization
How do we implement AnimalCollection::Load
now?
void AnimalCollection::Load(std::istream& _instream){
std::string nextline;
while(std::getline(_instream, nextline))
{
auto nextAnimal = /*which Animal to create???*/;
animals_.push_back(std::move(nextAnimal));
}
}
The problem is that the magic strings representing Cat
, Dog
, etc. are all hard-coded into *::Save
methods. From here, you might be tempted to take one of the following approaches:
AnimalCollection
should "just know" about all the magic strings for each derived type
if (nextLine == "cat")
{
auto nextAnimal = std::make_unique<Cat>();
animals_.push_back(std::move(nextAnimal));
}else if (nextLine == "dog")
{
// ...
}
// etc
Code like that should be a big red flag. What happens when we want to change the strings such that the first character is capitalized? "Cat" instead of "cat". Now we have to do it in two places — once in AnimalCollection.cpp
and again in Cat.cpp
.
Data that is duplicated should be assumed to already be out of sync.
- Perhaps each type should have a
GetType()
method that returns astring
(or anenum
that can be converted into astring
)
struct Cat : VisitableAnimal<Cat>
{
static std::string GetType(){
return "cat";
}
};
Then our deserialization code looks like this:
if (nextLine == Cat::GetType())
{
auto nextAnimal = std::make_unique<Cat>();
animals_.push_back(std::move(nextAnimal));
} else if (nextLine == Dog::GetType())
{
// ...
}
// etc.
Code like this might be useful if you are exposing an AnimalFactory
directly to the client (via a Factory Pattern). Our use case, though, begs the question of
"Do you really need to add another method to every class’ interface?"
I think the answer is a resounding "No", and I’m going to appeal to the authority of Bob Martin to support me here
Good developers learn to limit what they expose at the interfaces of their classes and modules. The fewer methods a class has, the better. – Robert Martin, Clean Code
What a great motivation for reusing our double dispatch machinery to solve this problem non-invasively.
Adding Serialization via double dispatch
We can reuse our existing double dispatch machinery that we’ve already gone to the trouble of adding to Animal
to save and load an AnimalCollection
.
Our goals are thus:
- Only write our magic strings in one location
- Avoid using run time type information
- Don’t touch any of our existing interfaces
Saving via double dispatch
We need to save a single Animal
to a stream, and we don’t need state, so let’s prefer a non-member, non-friend function to do this in order to maximize encapsulation.
// AnimalSerialization.h
namespace animal_serialization
{
// preconditions: Animal is not null, _outstream is open and ready
void Save(const Animal* _animal, std::ostream& _outstream);
}
In AnimalSerialization.cpp
, we need a way to translate Animal
instances into strings. Our double dispatch visitor comes in handy here.
The implementation is fairly straightforward. Deriving a SaveAnimalVisitor
from AnimalVisitor
allows us to immediately know about all types in the hierarchy. From there it’s a matter of printing the magic strings to a stream.
// AnimalSerialization.cpp
// ...
namespace
{
std::string CatString(){return "cat";}
std::string DogString(){return "dog";}
struct SaveAnimalVisitor : AnimalVisitor
{
public:
// precondition: _outstream will outlive the SaveAnimalVisitor instance
explicit SaveAnimalVisitor(std::ostream& _outstream) :
outstream_{&_outstream}
{}
void Visit(const Cat*) override{
*outstream_ << CatString() << "\n";
}
void Visit(const Dog*) override{
*outstream_ << DogString() << "\n";
}
// ... other overridden Visit methods
private:
std::ostream* outstream_ = nullptr;
};
} // anonymous namespace
From there it’s a simple matter of hooking up this new visitor to the Animal
we wish to serialize:
// AnimalSerialization.cpp
// ... (our visitor code)
namespace animal_serialization
{
void Save(const Animal* _animal, std::ostream& _outstream){
::SaveAnimalVisitor saveVisitor{_outstream};
_animal->Visit(&saveVisitor);
}
}
AnimalCollection::Save
uses it like so:
// AnimalCollection.cpp
void AnimalCollection::Save(std::ostream& _outstream) const{
for(const auto& animal : animals_)
{
animal_serialization::Save(animal.get(), _outstream);
}
}
That was disturbingly easy, right?
We could grow our Animal
hierarchy to 100 types and AnimalSerialization.cpp
would still only be ~400 LOC.
Loading
Now the challenge is to read strings from a stream and construct Animal
instances. How can double dispatch help us here?
While we cannot directly use double dispatch during the deserialization portion, what it enabled us to do was put all our magic strings into just the AnimalSerialization.cpp
source file. With that in place, we can implement animal_serialization::Load()
as a basic Factory:
// AnimalSerialization.h
namespace animal_serialization
{
// ...
// precondition: _instream is open and ready
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream);
}
// AnimalSerialization.cpp
// ...
std::unique_ptr<Animal> ParseAnimal(std::string _line){
if (_line == CatString())
return std::make_unique<Cat>();
else if (_line == DogString())
return std::make_unique<Dog>();
// etc.
}
namespace animal_serialization
{
// ...
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
std::vector<std::unique_ptr<Animal>> toReturn;
std::string line;
while(std::getline(line, _instream))
toReturn.push_back(ParseAnimal(line));
return toReturn;
}
Wow that was easy, too.
AnimalCollection::Load
has an easy task in front of it:
// AnimalCollection.cpp
// ...
void AnimalCollection::Load(std::istream& _instream)
{
std::vector<std::unique_ptr<Animal>> loadedAnimals = animal_serialization::Load(_instream);
animals_ = std::move(loadedAnimals);
}
From this point, we could take refactoring a number of steps further, ultimately going so far as to have a map
of strings
to functions returning Animal
instances:
// AnimalSerialization.cpp
// ...
template<class T>
std::unique_ptr<Animal> CreateAnimal()
{
return std::make_unique<T>();
}
using AnimalCreatorFunction = std::function<std::unique_ptr<Animal>()>;
std::unordered_map<std::string, AnimalCreatorFunction> animalFactory =
{
{CatString(), AnimalCreatorFunction{&CreateAnimal<Cat>}},
{DogString(), AnimalCreatorFunction{&CreateAnimal<Dog>}},
// ...
};
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
std::vector<std::unique_ptr<Animal>> toReturn;
std::string line;
while(std::getline(_instream, line))
toReturn.push_back(animalFactory.at(line)());
return toReturn;
}
Full-fledged demo here
At this point, we’ve moved the implementation details into the narrowest possible scope and avoided duplication at all cost. This is a pretty good stopping point; we could still grow to 100 derived Animal
types without over-complicating or overcrowding AnimalSerialization.cpp
(Perhaps ~500 LOC).
(If you find yourself in a situation where you DO need to split things up further, feel free to contact me (see my About Me page); there are other techniques we could use that are outside the scope of this article.)
Conclusion
The double dispatch pattern lends itself nicely to stable interfaces thanks to its reusability. In this post, I walked through how we might reuse it to implement basic serialization without needing to touch Animal
itself, or any derived class. I hope you’re already thinking of places in your codebase that could benefit from refactoring to use this pattern!
[…] the next blog post I will show a demonstration of using this machinery to implement I/O for a […]
Very useful. I’ve been struggling with how to do object loading myself.
[…] C++ classic visitor doesn’t exist. Most of the credit for this article & images goes to Andy G. The code snippets you see in this article is simplified not […]