Tuesday, May 3, 2022

Why does Haskell's take function accept insufficient elements?

Why does Haskell's take function accept insufficient elements?

This post is a long-form response to a question on Reddit, which asked:

I just started learning haskell, and the take function confuses me.

e.g take 10 [1,2,3,4,5] will yield [1,2,3,4,5]

How does it not generate an error ?

… and I have enough to say on this subject that I thought it would warrant a blog post rather than just a comment reply.

The easiest way to answer this question is to walk through all the possible alternative implementations that can fail when not given enough elements.

Solution 0: Output a Maybe

The first thing we could try would be to wrap the result in a Maybe, like this:

safeTake :: Int -> [a] -> Maybe [a]
safeTake 0      _   = Just []
safeTake n      []  = Nothing
safeTake n (x : xs) = fmap (x :) (safeTake (n - 1) xs)
>>> safeTake 3 [0..]
Just [0,1,2]

>>> safeTake 3 []
Nothing

The main deficiency with this approach is that it is insufficiently lazy. The result will not produce a single element of the output list until safeTake finishes consuming the required number of elements from the input list.

We can see the difference with the following examples:

>>> oops = 1 : 2 : error "Too short"

>>> take 1 (take 3 oops)
[1]

>>> safeTake 1 =<< safeTake 3 oops
*** Exception: Too short

Solution 1: Fail with error

Another approach would be to create a partial function that fails with an error if we run out of elements, like this:

partialTake :: Int -> [a] -> [a]
partialTake 0      _   = []
partialTake n (x : xs) = x : partialTake (n - 1) xs
>>> partialTake 3 [0..]
[0,1,2]

>>> partialTake 3 []
*Main> partialTake 3 []
*** Exception: Test.hs:(7,1)-(8,51): Non-exhaustive patterns in function partialTake

>>> partialTake 1 (partialTake 3 oops)
[1]

Partial functions like these are undesirable, though, so we won’t go with that solution.

Solution 2: Use a custom list-like type

Okay, but what if we could store a value at the end of the list indicating whether or not the take succeeded. One way we could do that would be to define an auxiliary type similar to a list, like this:

{-# LANGUAGE DeriveFoldable #-}

data ListAnd r a = Cons a (ListAnd r a) | Nil r deriving (Foldable, Show)

… where now the empty (Nil) constructor can store an auxiliary value. We can then use this auxiliary value to indicate to the user whether or not the function succeeded or not:

data Result = Sufficient | Insufficient deriving (Show)

takeAnd :: Int -> [a] -> ListAnd Result a
takeAnd 0      _   = Nil Sufficient
takeAnd n      []  = Nil Insufficient
takeAnd n (x : xs) = Cons x (takeAnd (n - 1) xs)
>>> takeAnd 3 [0..]
Cons 0 (Cons 1 (Cons 2 (Nil Sufficient)))

>>> takeAnd 3 []
Nil Insufficient

Also, the ListAnd type derives Foldable, so we can recover the old behavior by converting the ListAnd Result a type into [a] using toList:

>>> import Data.Foldable (toList)

>>> toList (takeAnd 3 [0..])
[0,1,2]

>>> toList (takeAnd 3 [])
[]

>>> toList (takeAnd 1 (toList (takeAnd 3 oops)))
[1]

This is the first total function that has the desired laziness characteristics, but the downside is that the take function now has a much weirder type. Can we solve this only using existing types from base?

Solution 3: Return a pair

Well, what if we were to change the type of take to return an pair containing an ordinary list alongside a Result, like this:

takeWithResult :: Int -> [a] -> ([a], Result)
takeWithResult 0      _   = (    [], Sufficient  )
takeWithResult n      []  = (    [], Insufficient)
takeWithResult n (x : xs) = (x : ys, result      )
  where
    (ys, result) = takeWithResult (n - 1) xs
>>> takeWithResult 3 [0..]
([0,1,2],Sufficient)
>>> takeWithResult 3 []
([],Insufficient)

Now we don’t need to add this weird ListAnd type to base, and we can recover the old behavior by post-processing the output using fst:

>>> fst (takeWithResult 3 [0..])
[0,1,2]
fst (takeWithResult 3 [])
[]

… and this also has the right laziness characteristics:

>>> fst (takeWithResult 1 (fst (takeWithResult 3 oops)))
[1]

… and we can replace Result with a Bool if want a solution that depends solely on types from base:

takeWithResult :: Int -> [a] -> ([a], Bool)
takeWithResult 0      _   = ([], True)
takeWithResult n      []  = ([], False)
takeWithResult n (x : xs) = (x : ys, result)
  where
    (ys, result) = takeWithResult (n - 1) xs

However, even this solution is not completely satisfactory. There’s nothing that forces the user to check the Bool value before accessing the list, so this is not as safe as, say, the safeTake function which returns a Maybe. The Bool included in the result is more of an informational value rather than a safeguard.

Conclusion

So the long-winded answer to the original question is that there are several alternative ways we could implement take that can fail if the input list is too small, but in my view each of them has their own limitations.

This is why I think Haskell’s current take function is probably the least worst of the alternatives, even if it’s not the safest possible implementation.

2 comments:

  1. This is a good explanation. The laziness problem with Maybe never occurred to me. In general, are there ways to "thread" the laziness of list-like types through another type like Maybe? Or is this a fundamental language design limitation?

    ReplyDelete
    Replies
    1. I think you may be getting at the idea of an "effectful stream". That idea has been expressed in many ways, but the easiest to read is probably the one in monad-coroutine:

      newtype Coroutine s m r = Coroutine
      { resume :: m (Either (s (Coroutine s m r)) r) }

      If you plug in Bool for r, ((,) a) for s, and Identity for m, then you get something isomorphic to Gabriella's ListAnd. This same concept is probably best known as FreeT in the free package (which also defines several other versions of it), and Stream in the streaming package. The pipes and conduits packages essentially specialize the functor (s) to ((,) a). The ListT, LogicT, and logict-sequence packages take off from FreeT in a different direction, fixing the final return type to () and also fixing the functor to ((,) a).

      Delete