Normal Forms
Haskell uses call-by-need (i.e. “laziness”) as its evaluation strategy. There’s no shortage of voices bellowing out “use strict data types, use lazy functions (or lazy ‘spines’);” “laziness by default was a mistake;” “laziness makes it hard to reason about performance.” Meanwhile, some Haskellers have shown concretely that strictness isn’t always the right default and that laziness is necessary to achieve good performance in some cases.
Laziness frees the programmer from ugly operational semantics hacks like “iterators,” allows for user-defined short-circuiting operations (versus strict languages, where boolean operators must use built-in magic1), and enables higher-order functions to be used ergonomically without performance sacrifices. However, there is one pitfall that is the source of all complaints about lazy evaluation:
Lazy programs will have live heap objects that are not in normal form.
What is “normal form?”
Normalization is Evaluation
GHCi has a command called :sprint
(that’s read like “s-print” and not “sprint”2) that allows one to observe values as they are, without performing any evaluation. We can define a list that has all of the natural numbers:
>>> nats = [0..] :: [Integer]
But no evaluation has happened yet. :sprint
will show a not-yet-evaluated-thing (a “thunk”) with _
:
>>> :sprint nats
nats = _
We can cause part of the nats
value to be evaluated by asking GHCi to print some:
>>> take 3 nats
[0,1,2]
And :sprint
shows that just enough of the structure of nats
was evaluated for GHCi to show what we asked for:
>>> :sprint nats
nats = 0 : 1 : 2 : _
Here we see that a constructor with multiple arguments can be in a state on the heap where some of the constructor’s arguments have been fully evaluated, and some have not. In the case of 2 : _
, we have a list cons cell whose contained element is fully evaluated, but whose next part of the list is not. It’s possible to end up with :
values in the opposite situation, where the next part of the list is known but the contained element is not. Suppose we have a list of all of the Fibonacci numbers:
>>> fibs = 0 : 1 : zipWith (+) fibs (tail fibs) :: [Integer]
Along with a predicate that checks whether or not a list of any type of element has a length of at least three; note that we don’t care about what the list elements are at all:
>>> :{
... atLeastThree :: [a] -> Bool
... atLeastThree (_:_:_:_) = True
... atLeastThree _ = False
... :}
>>> atLeastThree [1, 2]
False
>>> atLeastThree [1, 2, 3]
True
Now let’s ask GHCi if \(\infty \ge 3\):
>>> atLeastThree fibs
True
Let’s see what evaluation had to take place to evaluate atLeastThree fibs
:
>>> :sprint fibs
fibs = 0 : 1 : _ : _
Here we have a value with a mix of “fully evaluated” constructors (e.g. the literals 0
and 1
we provided), constructors in which some of the arguments are known (1 : _
), constructors in which no arguments are known (_ : _
), and values about which nothing at all is known, even the “top-level” constructor (_
). Note that GHCi does not parenthesize this value very clearly; 0 : (1 : (_ : _))
is more clear.3
“Normalization” refers to the process of applying an abstract rewriting system’s rules (e.g. the evaluation rules of the lambda calculus) to transform terms. In this setting, a term in “normal form” is a term which cannot be transformed by the rules any further; it is, intuitively speaking, “fully evaluated.” When it comes to Haskell, we can define “normalization” by analogy as “turning thunks (e.g. _
) into concrete values by doing evaluation”. By extension, we can say that a Haskell value is in “normal form.4” if there are no thunks left in its structure, meaning that there’s no more evaluation that can be done.
What precisely is a “concrete value?” Rather than simply saying that it’s a value on which no more evaluation can be done, we can use the following set of rules for determining whether or not a value is in normal form:
- The value’s constructor is determined, e.g.
:sprint
shows us something other than a single_
, like it did in our very first:sprint nats
invocation above. - The constructor’s arguments, if any, are not thunks. For example,
_ : _
doesn’t obey this rule, since both its arguments are thunks. - The constructor’s arguments are in normal form. In other words, these rules must apply to all constructor arguments recursively.
In the case of the infinite values nats
and fibs
used in the definition above, we can actually never reach normal form, because we’d have to evaluate forever to do so. In this example, we’ll use a value that we can evaluate to normal form:
>>> x = (1 + 1, 2 * 2) :: (Integer, Integer)
>>> :sprint x
x = (_,_)
>>> -- Not in normal form yet.
>>> fst x
2
>>> :sprint x
x = (2,_)
>>> -- Still not in normal form yet.
>>> snd x
4
>>> :sprint x
x = (2,4)
>>> -- How it's in normal form.
Weak Head Normal Form
Rather than causing evaluation to occur by having GHCi print values, base
provides a function called seq :: a -> b -> b
that can be used “force” evaluation. This is a useful intuition, but “forcing” evaluation isn’t exactly what seq
does, as the documentation explains:
The value of
seq a b
is bottom ifa
is bottom, and otherwise equal tob
. In other words, it evaluates the first argumenta
to weak head normal form (WHNF).seq
is usually introduced to improve performance by avoiding unneeded laziness.
Here, “bottom” just means some “divergent” computation, e.g. something that would take infinite time to evaluate (f x = f x
) or would throw an exception (error "slain!"
). This definition is a bit hand-wavy, but is sufficient to explain what seq
does. In order for seq
to evaluate to “bottom” if its first argument is bottom, it must evaluate its first argument. However, even though seq
is introduced to control laziness, it is itself lazy, in that it does not evaluate the first argument to “normal form” as we defined above. Instead, it stops once the first of the normal form rules given above is satisfied, i.e. when it has determined the first argument’s constructor. If that constructor takes arguments and those arguments happen to be thunks, seq
doesn’t care, and the whole expression evaluates to the second argument.
In the nats
example from before, seq
stops evaluating as soon as it uncovers the first list cons cell:
>>> nats = [0..] :: [Integer]
>>> :sprint nats
>>> nats = _
>>> seq nats ()
()
>>> :sprint nats
nats = 0 : _
While we’re playing around in GHCi, the second argument to seq
doesn’t matter, since GHCi will evaluate whatever we ask so that it can print it. When using seq
in a real program, typically the first argument will be a subexpression in the second argument, so that the first argument will be evaluated before evaluation of the containing expression proceeds. A classic example is the strict left fold, foldl'
foldl' :: (b -> a -> b) -> b -> [a] -> b
= z
foldl' _ z [] :xs) = let z' = f z x
foldl' f z (xin seq z' (foldl' f z' xs)
Plenty of good writing already exists on how to use seq
, namely Simon Marlow’s Parallel and Concurrent Programming in Haskell (this chapter in particular) and HaskellWiki’s article on folds. Another note: in practice it tends to be quite tedious to use seq
directly, so instead the handy bang patterns language extension is typically used.
The seq
documentation uses the term “weak head normal form” to describe a value in a state where its constructor is known, but the constructors’ arguments may not be known. In The Implementation of Functional Programming Languages, Simon Peyton Jones gives this more formal definition of weak head normal form:
A lambda expression is in weak head normal form (WHNF) if and only if it is of the form:
\(F \, E_1 \, E_2 \, \ldots \, E_n\)
where \(n \ge 0\)
and either \(F\) is a variable or data object, or \(F\) is a lambda abstraction or built-in function and \((F \, E_1 \, E_2 \, \ldots \, E_n)\) is not a redex for any \(m \le n\).
An expression has no top-level redex if and only if it is weak head normal form.
This extends the definition of weak head normal form to any sort of function application in general, not simply application of data constructors. You can think of a “top-level redex” as roughly equivalent to the lone _
that GHCi would :sprint
when examining totally unevaluated values. The following bit is quite important:
\((F \, E_1 \, E_2 \, \ldots \, E_n)\) is not a redex for any \(m \le n\)
This means that if \(F \, E_1\), or \(F \, E_1 \, E_2\), or so on, could be reduced, then the whole expression is not in weak head normal form. This is a formal statement of the intuitive notion from before that one must “evaluate until a constructor is determined” to reach WHNF.
For the remainder of this discussion, we will focus on the application of data constructors, setting aside the application of arbitrary functions. Although we can use seq
to control when function applications are reduced, we lack any control over whether or not arbitrary lambda abstractions are in normal form. The (->)
type is built-in magic in GHC, and it doesn’t make sense to talk about what its “constructors” are.
deepseq
and NFData
Equipped with workable definitions of normal form and weak head normal form we can examine (and indeed critique) a popular Haskell package called deepseq
. This package provides a class called NFData
and a few associated utility functions. NFData
has a single method, rnf :: a -> ()
that evaluates its argument to normal form, then returns ()
. The documentation for the Control.DeepSeq
module notes three useful applications of the class:
- Avoiding resource leaks in lazy IO programs.5
- Ensuring that lazy values containing divergent values do not “leak” outside the scope of their exception handler.6
- Forcing evaluation to occur on one thread that uses a synchronization primitive to send data to another thread.7
In practice, solving these sorts of issues is exactly what deepseq
is used for. However, in the package documentation there is no discussion of what laws NFData
instances should obey, nor is there any concrete definition of what “normal form” means. The closest the documentation gets is mentioning that “normal form data” is “data structures with no unevaluated components.”
Despite the lack of a solid definition of normal form, we find that almost all of the NFData
instances provided in the package satisfy the three normal form rules given earlier. A majority of the instances are for data types with no type arguments, and with constructors that accept monomorphic arguments. In the case of types like Bool
, Int
, Double
, etc., weak head normal form implies normal form, since the data constructors take no arguments. WHNF implies that the constructor is known, and only the first rule for normal forms is relevant in the case of constructors with no arguments, so for such types a value in weak head normal form is in normal form. In the case of monomorphic newtypes like CInt
, the procedure for evaluating to normal form follows trivially from the wrapped type.
A handful of the provided instances are structurally recursive. What should the NFData
instance for lists look like? We can derive the implementation by examining the normal form rules. By the first rule, we must determine if the list value’s constructor is :
or []
. If it’s []
we’re done, since that constructor takes no arguments. By the second rule, if the value’s constructor is determined to be :
, we must evaluate its arguments. By the third rule, we must evaluate those arguments to normal form; this means that the type of the list elements must have an NFData
instance as well. As a corollary to the third rule, in order to evaluate the list value to normal form, we must walk down the entire length of the list. We can follow this same derivation procedure to determine the correct instances for Maybe
, Either
, tuples, Ratio
, Complex
, and so on.
Where normal forms can become quite confusing and where NFData
instances can become quite controversial, is the case of types with phantom type variables. By “phantom type variables,” I mean type constructors that take arguments that do not appear in any of the data constructors’ arguments. This can lead to some confusing or counterintuitive NFData
instances, but with the normal form rules to guide us we can determine what the correct instances should be. Let’s look at an uncontroversial instance first, the Proxy
type.
Proxy
values are meant for passing type information around at the value level, and the type definition is quite simple:
data Proxy a = Proxy
Here we have a single constructor with no arguments, and a type constructor with a single phantom argument. Since the type constructor takes no arguments, by the first normal form rule, all we need to do is determine the constructor. Here’s the NFData
instance:
instance NFData (Proxy a) where
Proxy = () rnf
Although I have never required this instance myself, it obeys the normal form rules, and I’ve never encountered an argument that this instance should be removed from the deepseq
package. The key difference between this instance and the others we’ve discussed so far is that the type variable a
does not appear in the constructor. A critical corollary: Whether or not a value of type Proxy a
is in normal form has nothing to do with any values of type a
. This statement follows from the normal form rules, and it is absolutely critical to bear this principle in mind when reasoning about NFData
instances for types with phantom type variables.
Here’s another uncontroversial instance, the Const
functor. Here’s its definition:
newtype Const a b = Const { getConst :: a }
In this case, we have one type variable that appears in a data constructor, and one phantom type variable. From the normal form rules, this means that a
must have an NFData
instance if we are to write an NFData
instance for Const
, but b
does not require one, because whether or not a value of type Const a b
is in normal form has nothing to do with any values of type b
. Here is the NFData
instance:
instance NFData a => NFData (Const a b) where
= rnf . getConst rnf
Let’s consider a trickier case, the Ptr
type. base
provides a class called Storable
for types that can be read from and written to plain memory addresses. A value of type Ptr a
is an address that points to some data that may be read to produce a value of type a
, or that a value of type a
may be written to. However, the data constructor for Ptr
only contains an Addr#
, which is GHC’s primitive type for memory addresses, and a
is a phantom type variable. Indeed, a function castPtr :: Ptr a -> Ptr b
is provided, which allows us to change the pointer’s type to whatever we wish, just like a C style cast. Here’s the type definition:
data Ptr a = Ptr Addr#
Here we have one constructor with a single argument. Addr#
is a unique in that it is an “unboxed” type, for our purposes this just means that there are never any thunks that can be evaluated to a lone Addr#
. If we have an Addr#
value at all, then we know it’s evaluated. This means that Ptr a
is a type for which WHNF implies normal form, and deepseq
provides a utility function for writing NFData
instances for such types. Its definition is provided here, along with the NFData
instance:
rwhnf :: a -> ()
= seq x ()
rwhnf x
instance NFData (Ptr a) where
= rwhnf rnf
I’ve noticed an extremely common misunderstanding surrounding NFData
instances like the one for Ptr
. Because pointers can reference values, it is claimed that some programmers might assume that using rnf
on a pointer will evaluate the referred-to data. In the case of Ptr
, we cannot provide an rnf
that has this functionality without using unsafePerformIO
, since the only way to read or write the referred-to value is by doing IO. This same argument applies to FunPtr
and ForeignPtr
.
A yet more controversial case is IORef
. IORef
is a bit like Ptr
, however, the value referred to by an IORef
is a typical lifted heap object; it works with any type and there is no need for a Storable
-like class for reading and writing values. Furthermore, IORef
supports useful synchronization operations like atomicModifyIORef
. Here’s its definition:
data IORef a = IORef (STRef RealWorld a)
The IO
type is built upon the ST
type, and an IORef
is simply a wrapper around an STRef
with the right kind of state token. I won’t discuss the guts of IO
and ST
further here, but this Haskell WikiBook chapter on mutable objects provides a good primer. Here’s the definition of STRef
:
data STRef a = STRef (MutVar# s a)
Like the Addr#
in Ptr
, a MutVar#
is always in normal form. MutVar#
is interesting in that its first argument, the state token, behaves like a phantom type variable (values of type RealWorld
in the implementation of IO
behave similarly). With this information we can determine the correct NFData
instances for STRef
and IORef
, according to the normal form rules:
instance NFData (STRef s a) where
= rwhnf
rnf
instance NFData (IORef a) where
= rwhnf rnf
The same argument for these STRef
and IORef
instances applies to the instances for MVar
(deepseq
currently has such an instance) and TVar
(deepseq
is, as of this writing, missing an instance for TVar
).
So far we’ve discussed arguments for the presence of certain (perhaps unintuitive at first) NFData
instances. However, it should be noted that deepseq
contains an instance that, according to the normal form rules, cannot have an NFData
instance. This instance is provided for values of type a -> b
:
instance NFData (a -> b) where
= rwhnf rnf
It is not meaningful to apply the normal form rules as they’re presented here to values of function type. The first rule falls down: The ->
type has no constructors available to the programmer, so there is no way to satisfy it.
A Proposal for NFData
Based on the definitions and arguments presented here, I offer the following proposal to amend the deepseq
package as it currently exists:
- Add the normal form rules to the
deepseq
documentation as laws for theNFData
class, restating them more formally if necessary. - Add missing law-abiding instances for types in
base
, likeTVar
,ForeignPtr
, and others. - Remove law-breaking
NFData
instances, likea -> b
. - Reword package documentation to avoid any suggestion that
rnf
is a shotgun that can remove any and all thunks in its path; it’s inability to perform IO prohibits such functionality, and I believe that the perception that it provides such functionality is the root of widespread misunderstanding about what it means for value references to be in normal form.
If you ever find that you need missing NFData
instances for types in base
or other boot packages, please add them to my deepseq-instances
package. If you agree with the definitions I’ve formulated here, please consider participating in discussions surrounding deepseq
on its GitHub page and the Haskell libraries mailing list.
Consider the C expression
a ? b : c
.b
will only be evaluated ifa
is not zero, andc
will only be evaluated ifa
is zero. It is not possible to define your own control structure in C that works with arbitrary values and has this lazy short-circuiting behavor. Go ahead and try to define your own function in C that has the exact same semantics as the ternary operator; without tricks like promoting the arguments to function types, you won’t be able to do it.↩︎There is also a
:print
GHCi command that will assign fresh names to all of the found thunks. We don’t need that feature to demonstrate laziness, though.↩︎Thanks to Lalaithion42 on Reddit for pointing this out!↩︎
In the book The Implementation of Functional Programming Languages, Simon Peyton Jones gives this definition of normal form: “If an expression contains no redexes then evaluation is complete, and the expression is said to be in normal form.” Thunks are how the RTS represents not-yet-reduced redexes at program runtume.↩︎
I wasn’t able to find a good example of this issue in the wild, but here’s a short example:
returnException :: IO (Int, Int) = do returnException let e = error "x" <- try (pure (e, e)) :: IO (Either SomeException (Int, Int)) r case r of Left e -> throwIO e Right r -> pure r
You might expect
returnException
to always throw, but instead it never throws!>>> r <- returnException >>> r (*** Exception: x CallStack (from HasCallStack): error, called at <interactive>:47:13 in interactive:Ghci11
This is because
try
is lazy. See theevaluate
function for a similar use case.↩︎See the section MVar as a Container for Shared State in Simon Marlow’s Parallel and Concurrent Programming in Haskell for a good example of this problem.↩︎