Validation, a solved problem?

A validation problem with a twist.

Until recently, I thought that data validation was a solved problem: Use an applicative functor. I then encountered a forum question that for a few minutes shook my faith.

After brief consideration, though, I realised that all is good. Validation, even with a twist, is successfully modelled with an applicative functor. Faith in computer science restored.

The twist #

Usually, when you see a demo of applicative validation, the result of validating is one of two: either a parsed result, or a collection of error messages.

λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" 4 "Jane Doe" "j@example.com"
Validation (Right (Reservation {
    reservationDate = 2017-06-30 19:00:00 +0200,
    reservationQuantity = 4,
    reservationName = "Jane Doe",
    reservationEmail = "j@example.com"}))

λ> validateReservation $ ReservationJson "2017/14/12 6pm" 4.1 "Jane Doe" "jane.example.com"
Validation (Left ["Not a date.","Not a positive integer.","Not an email address."])

λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" (-3) "Jane Doe" "j@example.com"
Validation (Left ["Not a positive integer."])

(Example from Applicative validation.)

What if, instead, you're displaying an input form? When users enter data, you want to validate it. Imagine, for the rest of this short series of articles that the input form has three fields: name, date of birth, and address. Each piece of data has associated validation rules.

If you enter a valid name, but an invalid date of birth, you want to clear the input form's date of birth, but not the name. It's such a bother for a user having to retype valid data just because a single field turned out to be invalid.

Imagine, for example, that you want to bind the form to a data model like this F# record type:

type Input = { Name : string option; DoB : DateTime option; Address : string option}

Each of these three fields is optional. We'd like validation to work in the following way: If validation fails, the function should return both a list of error messages, and also the Input object, with valid data retained, but invalid data cleared.

One of the rules implied in the forum question is that names must be more than three characters long. Thus, input like this is invalid:

{ Name = Some "Tom"; DoB = Some eightYearsAgo; Address = Some "x" }

Both the DoB and Address fields, however, are valid, so, along with error messages, we'd like our validation function to return a partially wiped Input value:

{ Name = None; DoB = Some eightYearsAgo; Address = Some "x" }

Notice that both DoB and Address field values are retained, while Name has been reset.

A final requirement: If validation succeeds, the return value should be a parsed value that captures that validation took place:

type ValidInput = { Name : string; DoB : DateTime; Address : string }

That requirement is straightforward. That's how you'd usually implement application validation. It's the partial data round-trip that seems to throw a spanner in the works.

How should we model such validation?

Theory, applied #

There's a subculture of functional programming that draws heavily on category theory. This is most prevalent in Haskell. I've been studying category theory in an attempt to understand what it's all about. I even wrote a substantial article series about some design patterns and how they relate to theory.

One thing I learned after I'd named that article series is that most of the useful theoretical concepts come from abstract algebra, with the possible exception of monads.

People often ask me: does all that theory have any practical use?

Yes, it does, as it turns out. It did, for example, enable me to identify a solution to the above twist in five to ten minutes.

It's a discussion that I often have, particularly with the always friendly F# community. Do you have to understand functors, monads, etcetera to be a productive F# developer?

To anyone who wants to learn F# I'd respond: Don't worry about that at the gate. Find a good learning resource and dive right in. It's a friendly language that you can learn gradually.

Sooner or later, though, you'll run into knotty problems that you may struggle to address. I've seen this enough times that it looks like a pattern. The present forum question is just one example. A beginner or intermediate F# programmer will typically attempt to solve the problem in an ad-hoc manner that may or may not be easy to maintain. (The solution proposed by the author of that forum question doesn't, by the way, look half bad.)

To be clear: there's nothing wrong with being a beginner. I was once a beginner programmer, and I'm still a beginner in multiple ways. What I'm trying to argue here is that there is value in knowing theory. With my knowledge of abstract algebra and how it applies to functional programming, it didn't take me long to identify a solution. I'll get to that later.

Before I outline a solution, I'd like to round off the discussion of applied theory. That question about monads comes up a lot. Do I have to understand functors, monads, etcetera to be a good F# developer?

I think it's like asking Do I have to understand polymorphism, design patterns, the SOLID principles, etcetera to be a good object-oriented programmer?

Those are typically not the first topics people are taught about OOD. I would assert, however, that understanding such topics do help. They may not be required to get started with OOP, but knowing them makes you a better programmer.

I think the same is true for functional programming. It's just a different skill set that makes you better in that paradigm.

Solution outline #

When you know a bit of theory, you may know that validation can be implemented with an applicative sum type like Either (AKA Option), with one extra requirement.

Either has two dimensions, left or right (success or failure, ok or error, etcetera). The applicative nature of it already supplies a way to compose the successes, but what if there's more than one validation error?

In my article about applicative validation I showed how to collect multiple error messages in a list. Lists, however, form a monoid, so I typed the validation API to be that flexible.

In fact, all you need is a semigroup. When I wrote the article on applicative validation, Haskell's Semigroup type class wasn't yet a supertype of Monoid, and I (perhaps without sufficient contemplation) just went with Monoid.

What remains is that applicative validation can collect errors for any semigroup of errors. All we need to solve the above validation problem with a twist, then, is to identify a suitable semigroup.

I don't want to give away everything in this article, so I'm going to leave you with this cliffhanger. Which semigroup solves the problem? Read on.

As is often my modus operandi, I first did a proof of concept in Haskell. With its type classes and higher-kinded polymorphism, it's much faster to prototype solutions than even in F#. In the next article, I'll describe how that turned out.

After the Haskell article, I'll show how it translates to F#. You can skip the Haskell article if you like.

Conclusion #

I still think that validation is a solved problem. It's always interesting when such a belief for a moment is challenged, and satisfying to discover that it still holds.

This is, after all, not proof of anything. Perhaps tomorrow, someone will throw another curve ball that I can't catch. If that happens, I'll have to update my beliefs. Until then, I'll consider validation a solved problem.

Next: A Haskell proof of concept of validation with partial data round trip.

Published: Monday, 14 December 2020 08:28:00 UTC

Validation, a solved problem? by Mark Seemann

The twist #

Theory, applied #

Solution outline #

Conclusion #

Wish to comment?

Published

Tags