Simple data structures

Keep simple data structures simple! There’s no need for artificial pseudo-encapsulation when all you have is a bunch of data.

Recently I have come across a class that looked similar to this:

class Unit {
public:

  Unit(std::string name_, unsigned points_, int x_, int y_)
    : name{name_}, points{points_}, x{x_}, y{y_}
  {}

  Unit(std::string name_)
    : name{name_}, points{0}, x{0}, y{0}
  {}

  Unit()
    : name{""}, points{0}, x{0}, y{0}
  {}

  void setName(std::string const& n) {
    name = n;
  }

  std::string const& getName() const {
    return name;
  }

  void setPoints(unsigned p) {
    points = p;
  }

  unsigned getPoints() const {
    return points;
  }

  void setX(int x_) {
    x = x_;
  }

  int getX() const {
    return x;
  }

  void setY(int y_) {
    y = y_;
  }

  int getY() const {
    return x;
  }

private:
  std::string name;
  unsigned points;
  int x;
  int y;
};

Let’s have a closer look because this structure could be made much simpler.

Free access to everything

If we look at the getters and setters, we see that they are just a bunch of boilerplate. Books about object-oriented programming often talk in length about encapsulation. They encourage us to use getters and setters for every data member.

However, encapsulation means that there is some data that should be protected against free access. Usually, that’s because there is some logic that ties some of the data together. In such a case, access functions do checks and some data might be changed only together.

But C++ is not a purely object-oriented language. In some cases, we have structures that are just a simple bunch of data and nothing more. It’s best to not hide that fact behind a pseudo-class but make it obvious by using a struct with public data members. The effect is the same: everyone has unlimited access to everything.

What if the logic is elsewhere?

Sometimes, classes like this one just seem to be plain data containers, and the logic is hidden elsewhere. In the case of domain objects, this is called Anemic Domain Model and usually considered an antipattern. The usual solution is to refactor the code to move the logic into the class to be colocated with the data.

Whether we do so or leave the logic separated from the data, it should be a conscious decision. If we decide to leave data and logic separated, we should probably write that decision down. In that case, we’re back to the earlier conclusion: instead of the class, use a struct with public data.

Even if we decide to move the logic into the class there are rare cases where the actual encapsulation is provided outside the class. One example are detail classes in the “pimpl idiom”; nobody but the containing class and the pimpl itself will ever have access, so there’s no point in adding all those getters and setters.

Constructors

Constructors usually are needed to create an object in a consistent state and establish invariants. In the case of plain data structures, there are no invariants and no consistency that could be maintained. The constructors in the example above are only needed to not have to default construct an object and then immediately set each member via its setter.

If you look closely, there’s even a potential for bugs in there: Any std::string is implicitly convertible to Unit, because the single argument constructor is not explicit. Things like that can lead to a lot of debugging fun and headscratching.

Since C++11, we have the feature of in-class initializers. In cases like this one, they can be used instead of constructors. All the constructors above are covered by that approach. With that, the 53 lines of code in the example can be boiled down to 6 lines:

struct Unit {
  std::string name{ "" };
  unsigned points{ 0 };
  int x{ 0 };
  int y{ 0 };
};

Initialization looks as it did before if you used uniform initialization:

Unit a{"Alice"};
Unit b{"Bob", 43, 1, 2};
Unit c;

What if there is logic for one of the members?

A name probably shouldn’t be an empty string or contain special characters. Does that mean we have to throw it all over and make a proper class out of the Unit again? Probably not. Often we have logic at one place to validate and sanitize strings and similar things. Data that enters our program or library has to pass that point, and later we just assume that the data is valid.

If that is too close to the Anemic Domain Model, we still don’t have to encapsulate everything in our Unit class again. Instead, we can use a custom type that contains the logic instead std::string. After all, a std::string is an arbitrary bunch of characters. If we need something different, a std::string may be convenient but it’s the wrong choice. Our custom type might well have a proper constructor, so it can’t be default constructed as an empty string.

What if some of the data belongs together?`

If we look at the class yet again, we can pretty much assume that x and y are some sorts of coordinates. They probably belong together, so shouldn’t we have a method that sets both together? And maybe the constructors made sense as they allowed to set either both or none?

No, that’s not a solution. It may remedy a few of the symptoms, but we would still have the “Data Clump” code smell. Those two variables belong together, so they deserve their own structure or class.

Conclusion

In the end, our Unit looks like this:

struct Unit {
  PlayerName name;
  unsigned points{ 0 };
  Point location{ {0,0} };
};

It is small, it is simple. And the fact that it’s a struct with a few public members clearly sends the right message: it’s just a bundle of data.

Previous Post
Next Post

11 Comments


  1. Hi,

    I’m a CS student and recently we had a homework in which we were asked to implement a singly linked list with modern C++ features. Most of the class flunked the homework because almost nobody was able to implement the list using smart pointers.

    I was wondering if implementing a linked list with smart pointers is really a good idea ? My gut feeling is that this it not the best approach for performance reasons (e.g. using shared pointers will be really slow and using unique pointers could trigger the recursive destruction of the list if you delete the list header).

    Thanks

    Reply

    1. Hi Jean, you’re right about the shared_ptr, that’s not needed for a singly linked list and would be overhead.
      unique_ptr can definitely be used, but of course you’d have to pay attention to not destroy list elements that hold ownership of other elements which you still need.
      Also, be careful when destroying very long lists, as just deleting the head would trigger a chain of recursive destructor calls that can lead to a stack overflow.
      The actual thing I’d use to have a singly linked list with modern C++ features is std::forward_list – why manually implement something if smart people have put it in the standard library already 😉
      If you’d like to discuss this more, please feel free to drop me an email or a DM on Twitter!

      Reply

  2. Really nice article, I totally agree that it is cleaner to avoid writing things that are already done by compilers.
    I still wonder how can you handle stl container with simple data structures.

    In your examples, you give

    Unit a{"Alice"};

    and it is a nice syntax but you can’t call foo(Unit{"alice"}) making it unusable in practice with emplace and emplace_back methods of stl containers.

    Looking at : https://en.cppreference.com/w/cpp/language/direct_initialization it looks like we are in the case (2) and we can’t do it without object name.

    Reply

    1. You’re right. There is currently a flaw in the language and/or library, that aggregates cannot be properly used with emplace-style functions. That includes the methods you mention, but also make_unique and others. There is [LWG 2089](rather an old defect http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2089) for that issue, though I am not sure what the current status for it is.

      By default, I would not recommend adding a constructor, i.e. leave simple data structs as aggregates. However, if you have to work around that issue, a set of constructors is the right way to go.

      Reply

  3. I really like your point of view. I am often trying to tell people that adding setters is almost like having public members, and I prefer the reduced cognitive load of simpler PODs.

    If I may, I will plug a related article I wrote on Fluent C++’s blog: https://www.fluentcpp.com/2018/04/06/strong-types-by-struct/

    There are a lot of libraries out there to help create strong types, but often, a simple struct is enough.

    Reply

    1. Hi Rainer, yes, that guideline is definitely related to this post!

      Reply

  4. What if we want our POD to be immutable or just some of the properties read-only?

    Reply

    1. Hi Roger, if we want instances of the POD to be immutable, those should be marked const. If single members should be immutable, mark them const (and probably without a default initializer). That will make the structure non-assignable and has other ramifications though.

      Reply

  5. Nice article as usual.

    Anyway, I really don’t like the uniform initialization for anything more complex than a POD.
    Maybe I missed something, but I think it sticks you to your initial code shape: if you need to add a new member to your struct or class, or simply change the order because it now seems more logical, you cannot without changing all initializations of this struct (or even worse, you might not notice it if you switch two members with the same type).

    Even, for structs, I prefer creating a convenience constructor (no other access), so that the initialization is still clear but without the drawbacks of the initialization list.

    Reply

    1. You will have designated initializers in C++20 that will help in this cases.

      IIRC, we will have as well have that the use of ‘()’ will be equivalent to ‘{}’ to be able to do emplace for aggregates.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *