Haskell/Laziness

Hard work pays off later. Laziness pays off now! – Steven Wright

Introduction
By now, you are aware that Haskell uses lazy evaluation in that nothing is evaluated until necessary. But what exactly does "until necessary" mean? In this chapter, we will see how lazy evaluation works (how little magic there is), what exactly it means for functional programming, and how to make the best use of it.

First, let's consider the reasons and implications of lazy evaluation. At first glance, we might think that lazy evaluation makes programs more efficient. After all, what can be more efficient than not doing anything? In practice, however, laziness often introduces an overhead that leads programmers to hunt for places where they can make their code more strict. The real benefit of laziness is in making the right things efficient enough. Lazy evaluation allows us to write more simple, elegant code than we could in a strict environment.

Nonstrictness versus Laziness
There is a slight difference between laziness and nonstrictness. Nonstrict semantics refers to a given property of Haskell programs that you can rely on: nothing will be evaluated until it is needed. Lazy evaluation is how you implement nonstrictness using a device called thunks which we explain in the next section. However, these two concepts are so closely linked that it helps to explain them both together. Knowledge of thunks helps in understanding nonstrictness, and the semantics of nonstrictness explains why we use lazy evaluation in the first place. So, we'll introduce the concepts simultaneously and make no particular effort to keep them from intertwining (with the exception of getting the terminology right).

Thunks and Weak head normal form
You need to understand two principles to see how programs execute in Haskell. First, we have the property of nonstrictness: we evaluate as little as possible and delay evaluation as long as possible. Second, Haskell values are highly layered; and 'evaluating' a Haskell value could mean evaluating down to any one of these layers. Let's walk through a few examples using a pair.

Assume that in the ' ' part, we use  and   somewhere — otherwise, we wouldn't need to evaluate the let-binding at all! The right-hand side could have been, and it would still work if the 'in' part doesn't mention   or. This assumption will remain for all the examples in this section.

What do we know about ? We can calculate that  must be   and   is , but remember the first principle: we don't evaluate the calls to   and   until we're forced to. So, we say that  and   are both thunks: that is, they are unevaluated values with a recipe that explains how to evaluate them. For example, for  this recipe says 'Evaluate  '. However, we are actually doing some pattern matching on the left hand side. What would happen if we removed that?

Although it's still pretty obvious to us that  is a pair, the compiler sees that we're not trying to deconstruct the value on the right-hand side of the '=' sign at all, so it doesn't really care what's there. It lets  be a thunk on its own. Later on, when we try to use, we'll probably need one or both of the components, so we'll have to evaluate  , but for now, it can be a thunk.

Above, we said Haskell values were layered. We can see that at work if we pattern match on :

After the first line has been executed,  is simply a thunk. We know nothing about the sort of value it is because we haven't been asked to find out yet. In the second line, however, we pattern match on  using a pair pattern. The compiler thinks 'I better make sure that pattern does indeed match, and in order to do that, I need to make sure   is a pair.' Be careful, though — we're not yet doing anything with the component parts (the calls to   and  ), so they can remain unevaluated. In other words,, which was just a thunk, gets evaluated to something like  , and   and   become thunks which, when evaluated, will be the component parts of the original.

Let's try a slightly more complicated pattern match:

The pattern match on the second component of  causes some evaluation. The compiler wishes to check that the  pattern matches the second component of the pair. So, it:


 * 1) Evaluates the top level of   to ensure it's a cons cell:  . (If   had been an empty list we would encounter a pattern match failure error at this point.)
 * 2) Evaluates the first thunk it just revealed to make sure it's


 * The rest of the list stays unevaluated, and  becomes a thunk which, when evaluated, will be the rest of this list.



We can 'partially evaluate' (most) Haskell values. Also, there is some sense of the minimum amount of evaluation we can do. For example, if we have a pair thunk, then the minimum amount of evaluation takes us to the pair constructor with two unevaluated components:. If we have a list, the minimum amount of evaluation takes us either to a cons cell  or an empty list. Note that in the empty list case, no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form; if at some point we needed to, say, print  out to the user, we'd need to fully evaluate it, including those calls to   and , to  , where it is in normal form. Performing any degree of evaluation on a value is sometimes called forcing that value.

Note that for some values there is only one result. For example, you can't partially evaluate an integer. It's either a thunk or it's in normal form. Furthermore, if we have a constructor with strict components (annotated with an exclamation mark, as with ), these components become evaluated as soon as we evaluate the level above. I.e. we can never have  — as soon as we get to this level, the strictness annotation on the component of   forces us to evaluate the component part.

In this section we've explored the basics of laziness. We've seen that nothing gets evaluated until it is needed (in fact, the only place that Haskell values get evaluated is in pattern matches and inside certain primitive IO functions). This principle even applies to evaluating values — we do the minimum amount of work on a value that we need to compute our result.

Lazy and strict functions
Functions can be lazy or strict 'in an argument'. Most functions need to do something with their arguments, and this will involve evaluating these arguments to different levels. For example,  needs to evaluate only the cons cells in the argument you give it, not the contents of those cons cells. might evaluate to something like, which in turn evaluates to. Others need to evaluate their arguments fully, as in. If you had, there's no way you can do anything other than evaluate that thunk to normal form.

So, some functions evaluate their arguments more fully than others. Given two functions of one parameter,  and , we say   is stricter than   if   evaluates   to a deeper level than. Often, we only care about WHNF, so a function that evaluates its argument to at least WHNF is called strict and one that performs no evaluation is lazy. What about functions of more than one parameter? Well, we can talk about functions being strict in one parameter, but lazy in another. For example, given a function like the following:

Clearly we need to perform no evaluation on, but we need to evaluate   fully to normal form, so   is strict in its first parameter but lazy in its second.

In the original discussion about Folds, we discussed memory problems with  that are solved by the strictly-evaluated. Essentially,  and   both evaluate their arguments to the same level of strictness, but   can start producing values straight away, whereas   needs to evaluate cons cells all the way to the end before it starts producing any output. So, there are times when strictness can be valuable.

Black-box strictness analysis


Imagine we're given some function  which takes a single parameter. We're not allowed to look at its source code, but we want to know whether  is strict or not. How might we do this? Probably the easiest way is to use the standard Prelude value. Forcing  to any level of evaluation will halt our program and print an error, so all of these print errors:

So if a function is strict, passing it  will result in an error. Were the function lazy, passing it undefined would print no error and we can carry on as normal. For example, none of the following produce errors:

So we can say that  is a strict function if, and only if,   results in an error being printed and the halting of our program.

In the context of nonstrict semantics
What we've presented so far makes sense until you start to think about functions like. Is  strict? Our gut reaction is probably to say "No! It doesn't evaluate its argument, therefore it's lazy". However, let's apply our black-box strictness analysis from the last section to. Clearly,  is going to print an error and halt our program, so shouldn't we say that   is strict? The reason for this mixup is that Haskell's nonstrict semantics makes the whole issue a bit murkier.

Nothing gets evaluated if it doesn't need to be, according to nonstrictness. In the following code, will  be evaluated?

If you type this into GHCi, it seems strict — you'll get an error. However, our question was something of a trick. It doesn't make sense to state whether a value gets evaluated unless we're doing something to this value. Think about it: if we type in  into GHCi, the only reason we have to do any evaluation whatsoever is because GHCi has to print us out the result. Typing  again requires GHCi to print that list back to us, so it must evaluate it to normal form. In your average Haskell program, nothing at all will be evaluated until we come to perform the IO in. So it makes no sense to say whether something is evaluated or not unless we know what it's being passed to, one level up. One lesson here is: don't blindly trust GHCi because everything in GHCi is filtered through IO!

So when we say "Does  force  ?" what we really mean is "Given that we're forcing, does   get forced as a result?". Now we can turn our attention back to. If we force  to normal form, then   will be forced to normal form, so we conclude that   is strict. itself doesn't evaluate its argument, it just hands it on to the caller who will. One way to see this is in the following code:

doesn't "stop" the forcing, so it is strict. Contrast this to a clearly lazy function, :

The denotational view on things
If you're familiar with denotational semantics (perhaps you've read the wikibook chapter?), then the strictness of a function can be summed up very succinctly:

f ⊥ = ⊥ ⇔ f is strict

Assuming that, we can say that everything with type, including  ,  ,   and so on, has denotation &perp;.

Lazy pattern matching
You might have seen pattern matches like the following in Haskell sources.

The question is: what does the tilde sign (~) mean in the above pattern match? ~ makes a lazy pattern or irrefutable pattern. Normally, if you pattern match using a constructor as part of the pattern, you have to evaluate any argument passed into that function to make sure it matches the pattern. For example, if you had a function like the above, the third argument would be evaluated when you call the function to make sure the value matches the pattern. (Note that the first and second arguments won't be evaluated, because the patterns  and   match anything. Also it's worth noting that the components of the tuple won't be evaluated: just the 'top level'. Try   to see this.)

However, prepending a pattern with a tilde sign delays the evaluation of the value until the component parts are actually used. But you run the risk that the value might not match the pattern — you're telling the compiler 'Trust me, I know it'll work out'. (If it turns out it doesn't match the pattern, you get a runtime error.) To illustrate the difference:

Prelude> let f (x,y) = 1 Prelude> f undefined
 * Exception: Prelude.undefined

Prelude> let f ~(x,y) = 1 Prelude> f undefined 1

In the first example, the value is evaluated because it has to match the tuple pattern. You evaluate undefined and get undefined, which stops the proceedings. In the latter example, you don't bother evaluating the parameter until it's needed, which turns out to be never, so it doesn't matter you passed it. To bring the discussion around in a circle back to :

Prelude> (const 1 *** const 2) undefined (1,2)

If the pattern weren't irrefutable, the example would have failed.

When does it make sense to use lazy patterns?
Essentially, use lazy patterns when you only have the single constructor for the type, e.g. tuples. Multiple equations won't work nicely with irrefutable patterns. To see this, let's examine what would happen were we to make  have an irrefutable pattern:

head' :: [a] -> a head' ~[]    = undefined head' ~(x:xs) = x

We're using one of these patterns for sure, and we need not evaluate even the top level of the argument until absolutely necessary, so we don't know whether it's an empty list or a cons cell. As we're using an irrefutable pattern for the first equation, this will match, and the function will always return undefined.

Benefits of nonstrict semantics
Why make Haskell a nonstrict language in the first place? What advantages are there?

Separation of concerns without time penalty
Lazy evaluation encourages a kind of "what you see is what you get" mentality when it comes to coding. For example, let's say you wanted to find the lowest three numbers in a long list. In Haskell this is achieved extremely naturally:. However a naïve translation of that code in a strict language would be a very bad idea! It would involve computing the entire sorted list, then chopping off all but the first three elements. However, with lazy evaluation we stop once we've sorted the list up to the third element, so this natural definition turns out to be efficient (depending on the implementation of sort).

To give a second example, the function  from Data.List allows you to see if one string is a substring of another string. It's easily definable as:

Again, this would be suicide in a strict language as computing all the tails of a list would be very time-consuming. However, here we only evaluate enough elements of each tail to determine whether it has prefix  or not, and stop and return   once we found one with prefix.

There are many more examples along these lines. So we can write code that looks and feels natural and doesn't incur any time penalty.

However, as always in Computer Science (and in life), a tradeoff exists (in particular, a space-time tradeoff). Using a thunk instead of a plain  for a trivial calculation (like  ) can be a waste. For more examples, see the page on ../Strictness.

Improved code reuse
Often code reuse is desireable.

For example, we'll take again (but in more detail)  from Data.List. Let's look at the full definition:

Where,   and   are the functions taken from the   library. This function determines if its first parameter,  occurs as a subsequence of its second,  ; when applied on  's (i.e.  ), it checks if   is a substring of. Read in a strict way, it forms the list of all the tails of, then checks them all to see if any of them have   as a prefix. In a strict language, writing this function this way (relying on the already-written programs,  , and  ) would be silly, because it would be far slower than it needed to be. You'd end up doing direct recursion again, or in an imperative language, a couple of nested loops. You might be able to get some use out of, but you certainly wouldn't use. You might be able to write a usable shortcutting, but it would be more work, since you wouldn't want to use   to do it.

Now, in a lazy language, all the shortcutting is done for you. You don't end up rewriting  to shortcut when you find a solution or rewriting the recursion done in tails so that it will stop early again. You can reuse standard library code better. Laziness isn't just a constant-factor speed thing, it makes a qualitative impact on the code which is reasonable to write. In fact, it's commonplace to define infinite structures, and then only use as much as is needed, rather than having to mix up the logic of constructing the data structure with code that determines whether any part is needed. Code modularity is increased, as laziness gives you more ways to chop up your code into small pieces, each of which does a simple task of generating, filtering, or otherwise manipulating data.

Why Functional Programming Matters largely focuses on examples where laziness is crucial and provides a strong argument for lazy evaluation being the default.

Infinite data structures
 Examples:

''Infinite data structures usually tie a knot, too, but the Sci-Fi-Explanation of that is better left to the next section. One could move the next section before this one but I think that infinite data structures are simpler than tying general knots''

Tying the knot
More practical examples?

''Sci-Fi-Explanation: "You can borrow things from the future as long as you don't try to change them". Advanced: the "Blueprint"-technique. Examples: the one from the haskellwiki, the one from the mailing list.''

At first a pure functional language seems to have a problem with circular data structures. Suppose I have a data type like this:

If I want to create two objects "x" and "y" where "x" contains a reference to "y" and "y" contains a reference to "x" then in a conventional language this is straightforward: create the objects and then set the relevant fields to point to each other:

-- Not Haskell code x := new Foo; y := new Foo; x.value := 1; x.next := y; y.value := 2 y.next := x;

In Haskell this kind of modification is not allowed. So instead we depend on lazy evaluation:

This depends on the fact that the "Foo" constructor is a function, and like most functions it gets evaluated lazily. Only when one of the fields is required does it get evaluated.

It may help to understand what happens behind the scenes here. When a lazy value is created (for example, by a call to "Foo"), the compiler generates an internal data structure called a "thunk" containing the function call and arguments. When the value of the function is demanded, the function is called (as you would expect). But then the thunk data structure is replaced with the return value. Thus, anything else that refers to that value gets it straight away without the need to call the function.

(Note that the Haskell language standard makes no mention of thunks: they are an implementation mechanism. From the mathematical point of view this is a straightforward example of mutual recursion.)

So, when we call "circularFoo" the result "x" is actually a thunk. One of the arguments is a reference to a second thunk representing "y". This in turn has a reference back to the thunk representing "x". If we then use the value "next x" this forces the "x" thunk to be evaluated and returns a reference to the "y" thunk. If I use the value "next $ next x" then I force the evaluation of both thunks. So now both thunks have been replaced with the actual "Foo" structures, referring to each other. Which is what we wanted.

This is most often applied with constructor functions, but it isn't limited just to constructors. You can just as readily write:

The same logic applies.

Memoization, Sharing and Dynamic Programming
''Dynamic programming with immutable arrays. DP with other finite maps, Hinze's paper "Trouble shared is Trouble halved". Let-floating .''

Conclusions about laziness
Laziness:


 * Can make qualitative improvements to performance!
 * Can hurt performance in some other cases.
 * Makes code simpler.
 * Makes hard problems conceivable.
 * Allows for separation of concerns with regard to generating and processing data.