In the past two years I’ve become reasonably comfortable with both PureScript and Haskell. I’ve learned so many new things while diving into the pure functional programming ecosystem and many of these techniques can be applied to other paradigms. Unfortunately, the pure FP world can feel a bit like another dimension – where many programming problems have elegant solutions but the world of “regular” programming isn’t aware of these patterns.

One such pattern is called “applicative-style validation”, but I’ll simply call it “declarative validation”. In this post I’ll provide some motivation for using this technique and then build a small library in Python implementing these ideas.

Motivation

Many of our programs accept input from the user. Often we need to validate this input before continuing processing and, in the case of errors, inform the user of any problems. There are several techniques for performing this kind of validation, but the most common is to write some imperative code that walks over the input and builds up a list of errors. If there are no errors, the provided input is valid, otherwise it isn’t. We can wrap the result our validation returns in an object that indicates if the validation was successful.

@dataclass
class Valid:
    value: Any

    def is_valid():
        return True


@dataclass
class Invalid:
    value: Any

    def is_valid():
        return False


def validate_name(name, errors):
    if not isinstance(name, str) or name == "":
        errors.append("name must be a non-empty string")


def validate_age(age, errors):
    if not isinstance(age, int):
        errors.append("age must be an int")
    elif age < 10:
        errors.append("age must be at least 10")


def validate(data):
    errors = []
    validate_name(data.get("name"), errors)
    validate_age(data.get("age"), errors)

    if not errors:
        return Valid(data)
    else:
        return Invalid(errors)

While this approach works, things get complicated when we have validations that are dependent on previous results. Say, for example, that we want to add a new, more complicated rule stating that if the name is Drew, the age must be at least 40. In order to do this, both name and age need to be present and have the appropriate type. But we don’t have a convenient way to “reuse” this logic from the existing validate_name and validate_age functions. One approach is to simply re-check locally and assume errors have already been added if the types are incorrect.

def validate_drew(data, errors):
    if (not isinstance(data.get("name"), str) or
        not isinstance(data.get("age"), int)):
        return
    elif data.get("name") == "Drew" and data.get("age") < 40:
        errors.append("Drew must be old")

This isn’t great, because now we’ve duplicated the instance checks in two places. We could also make sure specific errors are not present in the errors list, but this would couple this validation to the errors exposed in a previous validations.

The downsides of the stateful validation approach can be overcome by using a “parsing” approach. That is, we declaratively describe the shape and type of the input that we expect and return an error if our data does not meet those expectations. This approach is extremely well documented in the post Parse, don’t validate. Parsing is a fantastic alternative to stateful validation, but this style of parsing (often called monadic parsing) does have one disadvantage – it halts processing as soon as the first error is reached. We’d like to collect as much information as possible on the invalid input for our user.

We can take another approach that gives us the composability of the parsing approach as well as the error accumulation of the stateful approach. This approach is traditionally called “applicative-style validation”.

Building Blocks

We’ll be providing two primary functions along with our existing Valid and Invalid types.

  1. validate_into allows us to call a provided function with a list of arguments, assuming all the arguments are Valid. Otherwise, it accumulates the errors in any Invalid arguments.
  2. and_then allows us to perform another “stage” of validations assuming the subject of the function is Valid. If the subject of the function is Invalid, we do nothing.

You can think of validate_into as building one “stage” of our validation pipeline and and_then as linking two stages together. Any validations within a stage will have their errors accumulated, but if a stage fails, we won’t run validations for any later stages. This means we should only break our validations into stages when a given stage depends on valid values from a previous stage.

Let’s use these two functions to reimplement our validations from above. First, we’ll define a Person class into which we’ll be placing the valid data.

@dataclass
class Person:
    name: str
    age: int

Now, we’ll re-define our validate function and its helpers.

def validate_name(name):
    if not isinstance(name, str) or name == "":
        return Invalid(["name must be a non-empty string"])
    else:
        return Valid(name)


def validate_age(age):
    if not isinstance(age, int):
        return Invalid(["age must be an integer"])
    elif age < 10:
        return Invalid(["age must be at least 10"])
    else:
        return Valid(age)


def validate_drew(person):
    if person.name == "Drew" and person.age < 40:
        return Invalid(["Drew is old"])
    else:
        return Valid(person)


def validate(data):
    return validate_into(
        Person,
        validate_name(data.get("name")),
        validate_age(data.get("age")),
    ).and_then(validate_drew)

There are a few things to notice here. First, each validation function stands alone. Second, there is no mutation of the input data happening. Each function performs its validations and then returns a Valid or Invalid value. Last, note that each Invalid returns a list of errors. This allows our accumulation to happen.

Let’s call our validate function a few times and see what happens:

validate({
    "name": None,
    "age": "hello",
})
# => Invalid(value=[
#             'name must be a non-empty string',
#             'age must be an integer'])


validate({
    "name": "Drew",
    "age": 38,
})
# => Invalid(value=['Drew is old'])


validate({
    "name": "Jane",
    "age": 38,
})
# => Valid(value=Person(name='Jane', age=38))

Notice that the second stage of our validations, namely validate_drew, can assume all of its input is Valid after the first stage. Therefore, we don’t need to re-check anything regarding the types of name or age before performing our specific validation (Drew needs to be old). Also notice how easy it would be to add new validations if we added a new argument to the Person constructor.

Implementing the Library

We might image that the library to support this code would be quite complicated. In practice, it is very simple. The only function outside of the standard library we use is curry from the toolz library, but if we wanted to drop this dependency we could re-implement curry ourselves.

from dataclasses import dataclass
from functools import reduce
from toolz import curry
from typing import Any


@dataclass
class Valid:
    value: Any

    def is_valid(self):
        return True

    def apply(self, other):
        if other.is_valid():
            return Valid(self.value(other.value))
        else:
            return other

    def and_then(self, f):
        return f(self.value)


@dataclass
class Invalid:
    value: Any

    def is_valid(self):
        return False

    def apply(self, other):
        if other.is_valid():
            return self
        else:
            return Invalid(self.value + other.value)

    def and_then(self, f):
        return self


def validate_into(f, *args):
    return reduce(lambda a, b: a.apply(b), args, Valid(curry(f)))

The above code is the entirety of our library. The and_then function is relatively straight forward. If we attempt to chain a new stage of validations on a Valid value, we simple invoke the provided function with the value inside of our Valid. If we attempt to chain a new stage of validations on an Invalid value, we just ignore the provided function and return self.

The validate_into function feels more complicated, so let’s describe what it is doing step by step. First, we curry the provided function. This is important because we’re going to be applying the function one argument at a time as we determine if each argument is Valid. We also place this curried function into a Valid wrapper as it starts in a Valid state before seeing any arguments. Then, one by one, we apply the next argument to our “function so far”. In the case that the argument is Valid and the “function so far” is valid, we just invoke the function with the argument and re-wrap it in Valid. If the “function so far” is Valid but the new argument is Invalid, we make the new “function so far” the Invalid result. Finally, and importantly, if the “function so far” is already Invalid and we’re provided a new Invalid argument, we concatenate the errors and re-wrap the result in Invalid.

Using these simple tools, we can write complicated, deeply-nested validators. Reuse of our validators is simple as they are nothing more than functions. We can place them in a package and share commonly used validators (think validate_presence) easily across our codebase.

Prior Art

Nothing in this post is new. I am reimplementing ideas from many other ecosystems in Python to make them more approachable. Applicative-style validation is just one of a huge number of ideas from the pure functional programming world that deserves wider recognition and adoption in more mainstream languages.