An example of how non-deterministic code can cause test cases to wither.

It seems to me that when people discuss functional programming, they spend much time discussing side effects and how to avoid them. Sometimes, they almost forget that non-deterministic behaviour is also something to avoid.

On the other hand, we've known for a long time that we should eradicate non-determinism in tests. Martin Fowler's article, however, mostly discusses false positives: Tests that fail even when no defect is manifest.

Unit tests may also exhibit false negatives. This can happen for a variety of reason. In my article Tautological assertion I describe one example.

The passing of time, however, has a tendency to introduce decay. This may apply to test suites as well. If a test or the System Under Test depends on the current time, a test may over time transition from a proper test to one that still passes, but for the wrong reason.

A test example #

I was recently looking at this test:

[Fact]
public async Task ChangeDateToSoldOutDate()
{
    var r1 = Some.Reservation;
    var r2 = Some.Reservation
        .WithId(Guid.NewGuid())
        .TheDayAfter()
        .WithQuantity(10);
    var db = new FakeDatabase();
    db.Grandfather.Add(r1);
    db.Grandfather.Add(r2);
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = r1.WithDate(r2.At).ToDto();
    var actual = await sut.Put(r1.Id.ToString("N"), dto);
 
    var oRes = Assert.IsAssignableFrom<ObjectResult>(actual);
    Assert.Equal(
        StatusCodes.Status500InternalServerError,
        oRes.StatusCode);
}

It's from the code base that accompanies my book Code That Fits in Your Head. It's a false negative waiting to happen. What's the problem?

Before I start describing the problem, I think that I owe you a short explanation of what you're looking at. The book has a single example code base that it uses to showcase various heuristics and techniques. The code base implements an online restaurant reservation REST API. I've previously discussed that code base on this blog.

This particular test exercises the following test case: A client can use the REST API to change an existing reservation. When it does that, the API should check that all business rules apply. Particularly, in this case, if you have an existing reservation (r1), but try to change it to another date, and that other date is already sold out, the update should be refused.

The test uses backdoor manipulation to establish the initial state. It's a state-based integration test that uses an in-memory Fake database. It adds two reservations (r1 and r2) to the database, on two different days.

What's not at all clear from the above code is that that 10 is the Grandfather.Restaurant's maximum capacity. (This is a fair criticism against this test, but not what this article is about.) It effectively means that the restaurant is sold out that day.

The test then, in the act phase, tries to change r1 to the date that's sold out and Put that change.

The assertion checks that this isn't possible.

A false negative waiting to happen #

What's the problem with the above test?

The problem is that when I wrote it, it exercised the above test case, but now it no longer does.

At this point, you're probably wondering. The test case is supposed to change the date of r1 to the date of r2, but no dates are visible in the test. Which dates does the test use?

The answer may be found in the Some class:

public readonly static Reservation Reservation =
    new Reservation(
        new Guid(0x81416928, 0xC236, 0x4EBF, 0xA4, 0x50, 0x24, 0x95, 0xA4, 0xDA, 0x92, 0x30),
        Now,
        new Email("x@example.net"),
        new Name(""),
        1);

Some.Reservation is an immutable test datum. It's intended to be used as a representative example of a Reservation value. The reservation date and time (the At property) is another immutable test datum:

public readonly static DateTime Now = new DateTime(2022, 4, 1, 20, 15, 0);

I wrote most of that code base in 2020. April 1st 2022 seemed far in the future, and I needed a value that could represent 'approximately' the current time. It didn't have to be the day the code would run, but (as it turns out) it's important that this date isn't in the past. Which it now is.

What happens now that Some.Reservation is in the past?

Nothing, and that's exactly what is the problem.

A new path #

Once the clock rolled over from 2022-04-02 20:14:59 to 2022-04-02 20:15:00, the pathway through the code changed. Why April 2 and not April 1? Because the date of importance in the test is TheDayAfter, and TheDayAfter is defined like this:

public static Reservation TheDayAfter(this Reservation reservation)
{
    return reservation.AddDate(TimeSpan.FromDays(1));
}

So r2.At is 20:15 April 2, 2022.

Before 2022-04-02 20:15:00 the test would exercise the test case it was supposed to. After all, I did write the test before I wrote the implementation, and I did see it fail before I saw it pass.

Once, however, dto is in the past, another pathway takes over. The MaƮtre d's core decision logic (WillAccept) contains this Guard Clause:

if (candidate.At < now)
    return false;

Long before the domain model investigates whether it can accommodate a reservation, it rejects all reservations in the past. The calling code, however, only sees that the return value is false. In other words, the externally observable behaviour is the same. This also means that test's assertion still passes. From the outside, you can't tell the difference.

Coverage decay #

If the behaviour is stable, then why is this a problem?

This is a problem because this decreases test coverage. While code coverage is a useless target measure, it's not a completely useless metric. The absolute percentage is less relevant than the trend. If code coverage decreases, one should at least investigate.

Here, specifically, it means that I thought I had a particular test case covered, and in reality it turns out that that's no longer the case. Now the test just verifies that you can't update past reservations, regardless of application state. Other tests already verify that, however, so this test now became redundant.

As I've described before, each test case exercises a particular path through the System Under Test:

Diagram that shows a unit test exercising one path through a unit.

If you have high code coverage, an aggregate diagram might look like this:

Diagram that shows several unit tests exercising most paths through a unit.

Notice that the top box now says Unit tests in plural. Together, the tests exercise most pathways through the code. There's one pathway not exercised: The path from the light blue node to the orange node. The point is that this diagram illustrates high code coverage, not complete code coverage. The following argument applies to that general situation.

What happened with the above test is that coverage decayed.

Diagram that shows several unit tests exercising one fewer paths through a unit that before.

In this figure, the green pathway is no longer exercised.

Not only did coverage decrease: Unless you're monitoring code coverage, you probably didn't notice. The code didn't change. Time just passed.

Threat to refactoring #

Why is this a problem? After all, code coverage isn't a goal; it's just a measure.

As Martin Fowler writes:

"to refactor, the essential precondition is [...] solid tests"

A good test suite verifies that the behaviour of your software doesn't change as you reorganise the code. The underlying assumption is that the test suite exercises important use cases and protects against regressions.

The above test is an example of a test case that silently disappears. If you think that your tests cover all important use cases, you may feel confident refactoring. After all, if all tests pass, you didn't break any existing behaviour.

Unbeknownst to you, however, the test suite may become less trustworthy. As time passes, test cases may disappear, as exemplified by the above test. While the test suite may be passing, a refactoring could introduce a regression - of behaviour you believed to be already covered.

Under the radar #

Why didn't I catch this earlier? After all, I'd already identified the problem in the code base. Once I realised the problem, I took steps to remedy it. I loaded the code base on a spare computer and changed the machine's year to a future year (2038, if I recall correctly). Then I ran the tests to see which ones failed.

I examined each failing test to verify that it actually failed because of the time and date. Then I corrected the test to make it relative to the system clock.

That took care of all the tests that would eventually break. On the other hand, it didn't unearth any of the tests that over time silently would become false negatives.

Make it relative #

Fixing the above test was easy:

[Fact]
public async Task ChangeDateToSoldOutDate()
{
    var r1 =
        Some.Reservation.WithDate(DateTime.Now.AddDays(8).At(20, 15));
    var r2 = r1
        .WithId(Guid.NewGuid())
        .TheDayAfter()
        .WithQuantity(10);
    var db = new FakeDatabase();
    db.Grandfather.Add(r1);
    db.Grandfather.Add(r2);
    var sut = new ReservationsController(
        new SystemClock(),
        new InMemoryRestaurantDatabase(Grandfather.Restaurant),
        db);
 
    var dto = r1.WithDate(r2.At).ToDto();
    var actual = await sut.Put(r1.Id.ToString("N"), dto);
 
    var oRes = Assert.IsAssignableFrom<ObjectResult>(actual);
    Assert.Equal(
        StatusCodes.Status500InternalServerError,
        oRes.StatusCode);
}

The only change is to the creation of r1 and r2. The test now explicitly sets the r1 date eight days in the future. It also derives r2 from r1, which makes the relationship between the two values more explicit. This in itself is a small improvement, I think.

Deterministic behaviour #

More than one person has told me that I obsess over details to a degree that can't be expected of most developers. If that's true, problems like this are likely to go unnoticed. After all, I discovered the problem because I was carefully reviewing each test more than a year after I originally wrote it. While I had a reason to do that, we can't expect developers to do that.

Are we doomed to suffer this kind of error, then? I don't have a silver bullet for you, but I don't think that all is lost. A little discipline may help. As I write in Code That Fits in Your Head, pair programming or code reviews can help raise awareness of issues.

On the tactical level, you might want to consider a policy that discourages absolute dates and times in unit tests.

On the strategic level, you may want to adopt the Functional Core, Imperative Shell architecture. After all, pure functions are intrinsically testable.

As I wrote in the beginning of this article, most people tend to focus on how to avoid side effects when discussing functional programming. A pure function, however, must be both side-effect-free and deterministic.

This means that instead of learning many specific rules (such as not using absolute time in unit tests), you may focus on one general rule: Write and test pure functions.

To be honest, though, this rule doesn't solve all problems. Even in a functional-core-imperative-shell architecture, the imperative shell isn't pure. The above integration test is, unfortunately, a test of the imperative shell. It's already using the SystemClock, and that's no oversight on my part. While I do favour the test pyramid, I also write integration tests - just as the test pyramid encourages me to do. I find it valuable that integration tests integrate as many components as possible. The more real components an integration test uses, the better the chance that it uncovers problems.

With integration tests, then, diligence is required to keep tests deterministic. In that context you can't rely on a universal rule like just write pure functions, because at the boundaries, programs aren't functional. Instead, some diligence and discipline is required. At least, I don't see other alternatives, but perhaps you do?

Conclusion #

If you use absolute dates in unit tests, what was once a future date may some day become a past date. This can change which pathways of the System Under Test an automated test exercises.

Is this important? It's probably not the most pressing problem you may run into. Even if test coverage silently decays, it may not be a problem if you never refactor the code in question. It's only if, during a refactoring, you introduce a defect that goes unnoticed that this may become a problem. It doesn't seem like the most likely of errors, but on the other hand, these kinds of unexpected problems have a tendency to occur at the most inopportune moments.


Comments

Why have you decided to make the date of the reservation relative to the SystemClock, and not the other way around? Would it be more deterministic to use a faked system clock instead?

2022-05-30 17:12 UTC

Laszlo, thank you for asking. I wrote a new article to answer your question.

2022-06-27 5:48 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 23 May 2022 05:55:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 23 May 2022 05:55:00 UTC