Haskell/Denotational semantics

Introduction
This chapter explains how to formalize the meaning of Haskell programs, the denotational semantics. It may seem to be nit-picking to formally specify that the program  means the same as the mathematical  square function that maps each number to its square, but what about the meaning of a program like   that loops forever? In the following, we will exemplify the approach first taken by Scott and Strachey to this question and obtain a foundation to reason about the correctness of functional programs in general and recursive definitions in particular. Of course, we will concentrate on those topics needed to understand Haskell programs.

Another aim of this chapter is to illustrate the notions strict and lazy that capture the idea that a function needs or needs not to evaluate its argument. This is a basic ingredient to predict the course of evaluation of Haskell programs and hence of primary interest to the programmer. Interestingly, these notions can be formulated concisely with denotational semantics alone, no reference to an execution model is necessary. They will be put to good use in Graph Reduction, but it is this chapter that will familiarize the reader with the denotational definition and involved notions such as &perp; ("Bottom"). The reader only interested in strictness may wish to poke around in section Bottom and Partial Functions and quickly head over to Strict and Non-Strict Semantics.

What is Denotational Semantics and what is it for?
What does a Haskell program mean? This question is answered by the denotational semantics of Haskell. In general, the denotational semantics of a programming language maps each of its programs to a mathematical object (denotation), that represents the meaning of the program in question. As an example, the mathematical object for the Haskell programs,  ,   and   can be represented by the integer 10. We say that all those programs denote the integer 10. The collection of such mathematical objects is called the semantic domain.

The mapping from program code to a semantic domain is commonly written down with double square brackets ("Oxford brackets") around program code. For example, $$[\![\texttt{2*5}]\!] = 10.$$ Denotations are compositional, i.e. the meaning of a program like  only depends on the meaning of its constituents: $$[\![\texttt{a+b}]\!] = [\![\texttt{a}]\!]+[\![\texttt{b}]\!].$$ The same notation is used for types, i.e. $$[\![\texttt{Integer}]\!]=\mathbb{Z}.$$ For simplicity however, we will silently identify expressions with their semantic objects in subsequent chapters and use this notation only when clarification is needed.

It is one of the key properties of purely functional languages like Haskell that a direct mathematical interpretation like " denotes 10" carries over to functions, too: in essence, the denotation of a program of type   is a mathematical function $$\mathbb{Z}\to\mathbb{Z}$$ between integers. While we will see that this expression needs refinement generally, to include non-termination, the situation for imperative languages is clearly worse: a procedure with that type denotes something that changes the state of a machine in possibly unintended ways. Imperative languages are tightly tied to operational semantics which describes their way of execution on a machine. It is possible to define a denotational semantics for imperative programs and to use it to reason about such programs, but the semantics often has operational nature and sometimes must be extended in comparison to the denotational semantics for functional languages. In contrast, the meaning of purely functional languages is by default completely independent from their way of execution. The Haskell98 standard even goes as far as to specify only Haskell's non-strict denotational semantics, leaving open how to implement them.

In the end, denotational semantics enables us to develop formal proofs that programs indeed do what we want them to do mathematically. Ironically, for proving program properties in day-to-day Haskell, one can use Equational reasoning, which transforms programs into equivalent ones without seeing much of the underlying mathematical objects we are concentrating on in this chapter. But the denotational semantics actually show up whenever we have to reason about non-terminating programs, for instance in Infinite Lists.

Of course, because they only state what a program is, denotational semantics cannot answer questions about how long a program takes or how much memory it eats; this is governed by the evaluation strategy which dictates how the computer calculates the normal form of an expression. On the other hand, the implementation has to respect the semantics, and to a certain extent, it is the semantics that determines how Haskell programs must be evaluated on a machine. We will elaborate on this in Strict and Non-Strict Semantics.

What to choose as Semantic Domain?
We are now looking for suitable mathematical objects that we can attribute to every Haskell program. In case of the example,   and  , it is clear that all expressions should denote the integer 10. Generalizing, every value  of type   is likely to denote an element of the set $$\mathbb{Z}$$. The same can be done with values of type. For functions like, we can appeal to the mathematical definition of "function" as a set of (argument,value)-pairs, its graph.

But interpreting functions as their graph was too quick, because it does not work well with recursive definitions. Consider the definition

shaves :: Integer -> Integer -> Bool 1 `shaves` 1 = True 2 `shaves` 2 = False 0 `shaves` x = not (x `shaves` x) _ `shaves` _ = False

We can think of,  and   as being male persons with long beards and the question is who shaves whom. Person  shaves himself, but   gets shaved by the barber   because evaluating the third equation yields. In general, the third line says that the barber  shaves all persons that do not shave themselves.

What about the barber himself, is  true or not? If it is, then the third equation says that it is not. If it is not, then the third equation says that it is. Puzzled, we see that we just cannot attribute  or   to  ; the graph we use as interpretation for the function   must have an empty spot. We realize that our semantic objects must be able to incorporate partial functions, functions that are undefined for some values of their arguments (..that is otherwise permitted by the arguments' types).

It is well known that this famous example gave rise to serious foundational problems in set theory. It's an example of an impredicative definition, a definition which uses itself, a logical circle. Unfortunately for recursive definitions, the circle is not the problem but the feature.

&perp; Bottom
To define partial functions, we introduce a special value ⊥, named bottom and commonly written  in typewriter font. We say that ⊥ is the completely "undefined" value or function. Every basic data type like  or   contains one ⊥ besides their usual elements. So the possible values of type  are

$$\bot, 0, +1, -1, +2, -2, +3, -3, \dots$$

Adding ⊥ to the set of values is also called lifting. This is often depicted by a subscript like in $$\mathbb{Z}_\bot$$. While this is the correct notation for the mathematical set "lifted integers", we prefer to talk about "values of type ". We do this because $$\mathbb{Z}_\bot$$ suggests that there are "real" integers $$\mathbb{Z}$$, but inside Haskell, the "integers" are.

As another example, the type  with only one element actually has two inhabitants:

$$\bot, $$

For now, we will stick to programming with s. Arbitrary algebraic data types will be treated in section Algebraic Data Types since strict and non-strict languages diverge on how these include ⊥.

In Haskell, the expression  denotes ⊥. With its help, one can indeed verify some semantic properties of actual Haskell programs. has the polymorphic type  which of course can be specialized to ,  ,   and so on. In the Haskell Prelude, it is defined as

As a side note, it follows from the Curry-Howard isomorphism that any value of the polymorphic type  must denote ⊥.

Partial Functions and the Semantic Approximation Order
Now, $$\bot$$ gives us the possibility to have partial functions:

$$f(n) = \begin{cases} 1 & \mbox{ if } n \mbox{ is } 0 \\ -2 & \mbox{ if } n \mbox{ is } 1 \\ \bot & \mbox{ else } \end{cases} $$

Here, $$f(n)$$ yields well defined values for $$n=0$$ and $$n=1$$ but gives $$\bot$$ for all other $$n$$.

The type  has its $$\bot$$ too, and it is defined with the help of the $$\bot$$ from   this way:

$$\bot(n) = \bot$$ for all $$n$$

where the $$\bot$$ on the left hand side is of type, and the one on the right hand side is of type.

To formalize, partial functions say, of type  are at least mathematical mappings from the lifted integers $$\mathbb{Z}_\bot=\{\bot, 0, \pm 1, \pm 2, \pm 3, \dots\}$$ to the lifted integers. But this is not enough, since it does not acknowledge the special role of $$\bot$$. For example, the definition

$$g(n) = \begin{cases} 1 & \mbox{ if } n \mbox{ is } \bot \\ \bot & \mbox{ else } \end{cases} $$

is a function that turns infinite loops into terminating programs and vice versa, which is solving the halting problem. How can $$g(\bot)$$ yield a defined value when $$g(1)$$ is undefined? The intuition is that every partial function $$g$$ should yield more defined answers for more defined arguments. To formalize, we can say that every concrete number is more defined than $$\bot$$:

$$\bot\sqsubset 1\ ,\ \bot\sqsubset 2\, \dots$$

Here, $$a\sqsubset b$$ denotes that $$b$$ is more defined than $$a$$. Likewise, $$a\sqsubseteq b$$ will denote that either $$b$$ is more defined than $$a$$ or both are equal (and so have the same definedness). $$\sqsubset$$ is also called the semantic approximation order because we can approximate defined values by less defined ones thus interpreting "more defined" as "approximating better". Of course, $$\bot$$ is designed to be the least element of a data type, we always have that $$\bot\sqsubset x$$ for all $$x$$, except the case when $$x$$ happens to denote $$\bot$$ itself:

$$\forall x\neq\bot\ \ \ \bot\sqsubset x $$

As no number is more defined than another, the mathematical relation $$\sqsubset$$ is false for any pair of numbers:

$$ 1 \sqsubset 1 $$ does not hold. neither $$1 \sqsubset 2$$ nor $$2 \sqsubset 1$$ hold.

This is contrasted to ordinary order predicate $$\le$$, which can compare any two numbers. A quick way to remember this is the sentence: "$$1$$ and $$2$$ are different in terms of information content but are equal in terms of information quantity". That's another reason why we use a different symbol: $$\sqsubseteq$$. neither $$1 \sqsubseteq 2$$ nor $$2 \sqsubseteq 1$$ hold, but $$1 \sqsubseteq 1$$ holds.

One says that $$\sqsubseteq$$ specifies a partial order and that the values of type  form a partially ordered set (poset for short). A partial order is characterized by the following three laws
 * Reflexivity, everything is just as defined as itself: $$x \sqsubseteq x$$ for all $$x$$
 * Transitivity: if $$x \sqsubseteq y$$ and $$y \sqsubseteq z$$, then $$x \sqsubseteq z$$
 * Antisymmetry: if both $$x \sqsubseteq y$$ and $$y \sqsubseteq x$$ hold, then $$x$$ and $$y$$ must be equal: $$x=y$$.

We can depict the order $$\sqsubseteq$$ on the values of type  by the following graph



where every link between two nodes specifies that the one above is more defined than the one below. Because there is only one level (excluding $$\bot$$), one says that  is a flat domain. The picture also explains the name of $$\bot$$: it's called bottom because it always sits at the bottom.

Monotonicity
Our intuition about partial functions now can be formulated as following: every partial function $$f$$ is a monotone mapping between partially ordered sets. More defined arguments will yield more defined values:

$$ x\sqsubseteq y \Rightarrow f(x)\sqsubseteq f(y) $$

In particular, a monotone function $$h$$ with $$h(\bot)=1$$ is constant: $$h(n)=1$$ for all $$n$$. Note that here it is crucial that $$1 \sqsubseteq 2$$ etc. don't hold.

Translated to Haskell, monotonicity means that we cannot use $$\bot$$ as a condition, i.e. we cannot pattern match on $$\bot$$, or its equivalent. Otherwise, the example $$g$$ from above could be expressed as a Haskell program. As we shall see later, $$\bot$$ also denotes non-terminating programs, so that the inability to observe $$\bot$$ inside Haskell is related to the halting problem.

Of course, the notion of more defined than can be extended to partial functions by saying that a function is more defined than another if it is so at every possible argument:

$$f \sqsubseteq g \mbox{ if } \forall x. f(x) \sqsubseteq g(x)$$

Thus, the partial functions also form a poset, with the undefined function $$\bot(x)=\bot$$ being the least element.

Approximations of the Factorial Function
Now that we have the means to describe partial functions, we can give an interpretation to recursive definitions. Let's take the prominent example of the factorial function $$f(n)=n!$$ whose recursive definition is

$$f(n) = \mbox{ if } n == 0 \mbox{ then } 1 \mbox{ else } n \cdot f(n-1)$$

Although we saw that interpreting this recursive function directly as a set description may lead to problems, we intuitively know that in order to calculate $$f(n)$$ for every given $$n$$ we have to iterate the right hand side. This iteration can be formalized as follows: we calculate a sequence of functions $$f_k$$ with the property that each function consists of the right hand side applied to the previous function, that is

$$f_{k+1}(n) = \mbox{ if } n == 0 \mbox{ then } 1 \mbox{ else } n \cdot f_k(n-1)$$

We start with the undefined function $$f_0(n) = \bot$$, and the resulting sequence of partial functions reads:

$$f_1(n) = \begin{cases} 1 & \mbox{ if } n \mbox{ is } 0 \\ \bot & \mbox{ else } \end{cases} \ ,\ f_2(n) = \begin{cases} 1 & \mbox{ if } n \mbox{ is } 0 \\ 1 & \mbox{ if } n \mbox{ is } 1 \\ \bot & \mbox{ else } \end{cases} \ ,\ f_3(n) = \begin{cases} 1 & \mbox{ if } n \mbox{ is } 0 \\ 1 & \mbox{ if } n \mbox{ is } 1 \\ 2 & \mbox{ if } n \mbox{ is } 2 \\ \bot & \mbox{ else } \end{cases} $$

and so on. Clearly,

$$\bot=f_0 \sqsubseteq f_1 \sqsubseteq f_2 \sqsubseteq \dots $$

and we expect that the sequence converges to the factorial function.

The iteration follows the well known scheme of a fixed point iteration

$$ x_0, g(x_0), g(g(x_0)), g(g(g(x_0))), \dots $$

In our case, $$x_0$$ is a function and $$g$$ is a functional, a mapping between functions. We have

$$ x_0 = \bot$$ and $$ g(x) = n\mapsto\mbox{ if } n == 0 \mbox{ then } 1 \mbox{ else } n*x(n-1) \,$$

If we start with $$x_0 = \bot$$, the iteration will yield increasingly defined approximations to the factorial function

$$ \bot\sqsubseteq g(\bot)\sqsubseteq g(g(\bot))\sqsubseteq g(g(g(\bot)))\sqsubseteq \dots $$

(Proof that the sequence increases: The first inequality $$\bot\sqsubseteq g(\bot)$$ follows from the fact that $$\bot$$ is less defined than anything else. The second inequality follows from the first one by applying $$g$$ to both sides and noting that $$g$$ is monotone. The third follows from the second in the same fashion and so on.)

It is very illustrative to formulate this iteration scheme in Haskell. As functionals are just ordinary higher order functions, we have

g :: (Integer -> Integer) -> (Integer -> Integer) g x = \n -> if n == 0 then 1 else n * x (n-1) x0 :: Integer -> Integer x0 = undefined (f0:f1:f2:f3:f4:fs) = iterate g x0

We can now evaluate the functions  at sample arguments and see whether they yield   or not:

> f3 0 1 > f3 1 1 > f3 2 2 > f3 5 *** Exception: Prelude.undefined > map f3 [0..] [1,1,2,*** Exception: Prelude.undefined > map f4 [0..] [1,1,2,6,*** Exception: Prelude.undefined > map f1 [0..] [1,*** Exception: Prelude.undefined

Of course, we cannot use this to check whether  is really undefined for all arguments.

Convergence
To the mathematician, the question whether this sequence of approximations converges is still to be answered. For that, we say that a poset is a directed complete partial order (dcpo) iff every monotone sequence $$x_0\sqsubseteq x_1\sqsubseteq \dots$$ (also called chain) has a least upper bound (supremum)

$$\sup_{\sqsubseteq} \{x_0\sqsubseteq x_1\sqsubseteq \dots\} = x$$.

If that's the case for the semantic approximation order, we clearly can be sure that monotone sequence of functions approximating the factorial function indeed has a limit. For our denotational semantics, we will only meet dcpos which have a least element $$\bot$$ which are called complete partial orders (cpo).

The s clearly form a (d)cpo, because the monotone sequences consisting of more than one element must be of the form

$$\bot\sqsubseteq\dots\sqsubseteq\ \bot\sqsubseteq n\sqsubseteq n\sqsubseteq \dots\sqsubseteq n$$

where $$n$$ is an ordinary number. Thus, $$n$$ is already the least upper bound.

For functions, this argument fails because monotone sequences may be of infinite length. But because  is a (d)cpo, we know that for every point $$n$$, there is a least upper bound

$$\sup_{\sqsubseteq} \{\bot=f_0(n) \sqsubseteq f_1(n) \sqsubseteq f_2(n) \sqsubseteq \dots\} =: f(n)$$.

As the semantic approximation order is defined point-wise, the function $$f$$ is the supremum we looked for.

These have been the last touches for our aim to transform the impredicative definition of the factorial function into a well defined construction. Of course, it remains to be shown that $$f(n)$$ actually yields a defined value for every $$n$$, but this is not hard and far more reasonable than a completely ill-formed definition.

Bottom includes Non-Termination
It is instructive to try our newly gained insight into recursive definitions on an example that does not terminate:

$$f(n) = f(n+1)$$

The approximating sequence reads

$$f_0 = \bot, f_1 = \bot, \dots$$

and consists only of $$\bot$$. Clearly, the resulting limit is $$\bot$$ again. From an operational point of view, a machine executing this program will loop indefinitely. We thus see that $$\bot$$ may also denote a non-terminating function or value. Hence, given the halting problem, pattern matching on $$\bot$$ in Haskell is impossible.

Interpretation as Least Fixed Point
Earlier, we called the approximating sequence an example of the well known "fixed point iteration" scheme. And of course, the definition of the factorial function $$f$$ can also be thought as the specification of a fixed point of the functional $$g$$:

$$f = g(f) = n\mapsto\mbox{ if } n == 0 \mbox{ then } 1 \mbox{ else } n\cdot f(n-1)$$

However, there might be multiple fixed points. For instance, there are several $$f$$ which fulfill the specification

$$f = n\mapsto\mbox{ if } n == 0 \mbox{ then } 1 \mbox{ else } f(n+1)$$,

Of course, when executing such a program, the machine will loop forever on $$f(1)$$ or $$f(2)$$ and thus not produce any valuable information about the value of $$f(1)$$. This corresponds to choosing the least defined fixed point as semantic object $$f$$ and this is indeed a canonical choice. Thus, we say that

$$f=g(f)$$,

defines the least fixed point $$f$$ of $$g$$. Clearly, least is with respect to our semantic approximation order $$\sqsubseteq$$.

The existence of a least fixed point is guaranteed by our iterative construction if we add the condition that $$g$$ must be continuous (sometimes also called "chain continuous"). That simply means that $$g$$ respects suprema of monotone sequences:

$$\sup_{\sqsubseteq}\{g(x_0)\sqsubseteq g(x_1) \sqsubseteq\dots\} = g\left(\sup_{\sqsubseteq}\{x_0\sqsubseteq x_1\sqsubseteq\dots\}\right)$$

We can then argue that with

$$f=\sup_{\sqsubseteq}\{x_0\sqsubseteq g(x_0)\sqsubseteq g(g(x_0))\sqsubseteq\dots\}$$

we have

$$\begin{array}{lcl} g(f) &=& g\left(\sup_{\sqsubseteq}\{x_0\sqsubseteq g(x_0)\sqsubseteq g(g(x_0))\sqsubseteq\dots\}\right)\\ &=& \sup_{\sqsubseteq}\{g(x_0)\sqsubseteq g(g(x_0))\sqsubseteq\dots\}\\ &=& \sup_{\sqsubseteq}\{x_0 \sqsubseteq g(x_0)\sqsubseteq g(g(x_0))\sqsubseteq\dots\}\\ &=& f \end{array}$$

and the iteration limit is indeed a fixed point of $$g$$. You may also want to convince yourself that the fixed point iteration yields the least fixed point possible.

By the way, how do we know that each Haskell function we write down indeed is continuous? Just as with monotonicity, this has to be enforced by the programming language. Admittedly, these properties can somewhat be enforced or broken at will, so the question feels a bit void. But intuitively, monotonicity is guaranteed by not allowing pattern matches on $$\bot$$. For continuity, we note that for an arbitrary type, every simple function   is automatically continuous because the monotone sequences of  s are of finite length. Any infinite chain of values of type  gets mapped to a finite chain of  s and respect for suprema becomes a consequence of monotonicity. Thus, all functions of the special case  must be continuous. For functionals like $$g$$, the continuity then materializes due to currying, as the type is isomorphic to  and we can take.

In Haskell, the fixed interpretation of the factorial function can be coded as

with the help of the fixed point combinator

.

We can define it by

which leaves us somewhat puzzled because when expanding $$factorial$$, the result is not anything different from how we would have defined the factorial function in Haskell in the first place. But of course, the construction this whole section was about is not at all present when running a real Haskell program. It's just a means to put the mathematical interpretation of Haskell programs on a firm ground. Yet it is very nice that we can explore these semantics in Haskell itself with the help of.

Strict and Non-Strict Semantics
After having elaborated on the denotational semantics of Haskell programs, we will drop the mathematical function notation $$f(n)$$ for semantic objects in favor of their now equivalent Haskell notation.

Strict Functions
A function  with one argument is called strict, if and only if

.

Here are some examples of strict functions

id    x = x succ   x = x + 1 power2 0 = 1 power2 n = 2 * power2 (n-1)

and there is nothing unexpected about them. But why are they strict? It is instructive to prove that these functions are indeed strict. For, this follows from the definition. For, we have to ponder whether   is   or not. If it was not, then we should for example have  or more general   for some concrete number k. We remember that every function is monotone, so we should have for example

as. But neither of,   nor   is valid so that k cannot be 2. In general, we obtain the contradiction

.

and thus the only possible choice is

and  is strict.

Non-Strict and Strict Languages
Searching for non-strict functions, it happens that there is only one prototype of a non-strict function of type :

one x = 1

Its variants are  for every concrete number. Why are these the only ones possible? Remember that  can be no less defined than. As  is a flat domain, both must be equal.

Why is  non-strict? To see that it is, we use a Haskell interpreter and try

> one (undefined :: Integer) 1

which is not &perp;. This is reasonable as  completely ignores its argument. When interpreting &perp; in an operational sense as "non-termination", one may say that the non-strictness of  means that it does not force its argument to be evaluated and therefore avoids the infinite loop when evaluating the argument &perp;. But one might as well say that every function must evaluate its arguments before computing the result which means that  should be &perp;, too. That is, if the program computing the argument does not halt,  should not halt as well. It shows up that one can choose freely this or the other design for a functional programming language. One says that the language is strict or non-strict depending on whether functions are strict or non-strict by default. The choice for Haskell is non-strict. In contrast, the functional languages ML and Lisp choose strict semantics.

Functions with several Arguments
The notion of strictness extends to functions with several variables. For example, a function  of two arguments is strict in the second argument if and only if

for every. But for multiple arguments, mixed forms where the strictness depends on the given value of the other arguments, are much more common. An example is the conditional

cond b x y = if b then x else y

We see that it is strict in  depending on whether the test   is   or  :

cond True x &perp; = x cond False x &perp; = &perp;

and likewise for. Apparently,  is certainly &perp; if both   and   are, but not necessarily when at least one of them is defined. This behavior is called joint strictness.

Clearly,  behaves like the if-then-else statement where it is crucial not to evaluate both the   and the   branches:

if null xs then 'a' else head xs if n == 0 then  1  else 5 / n

Here, the else part is &perp; when the condition is met. Thus, in a non-strict language, we have the possibility to wrap primitive control statements such as if-then-else into functions like. This way, we can define our own control operators. In a strict language, this is not possible as both branches will be evaluated when calling  which makes it rather useless. This is a glimpse of the general observation that non-strictness offers more flexibility for code reuse than strictness. See the chapter Laziness for more on this subject.

Algebraic Data Types
After treating the motivation case of partial functions between s, we now want to extend the scope of denotational semantics to arbitrary algebraic data types in Haskell.

A word about nomenclature: the collection of semantic objects for a particular type is usually called a domain. This term is more a generic name than a particular definition and we decide that our domains are cpos (complete partial orders), that is sets of values together with a relation more defined that obeys some conditions to allow fixed point iteration. Usually, one adds additional conditions to the cpos that ensure that the values of our domains can be represented in some finite way on a computer and thereby avoiding to ponder the twisted ways of uncountable infinite sets. But as we are not going to prove general domain theoretic theorems, the conditions will just happen to hold by construction.

Constructors
Let's take the example types

data Bool   = True | False data Maybe a = Just a | Nothing

Here,,   and   are nullary constructors whereas   is a unary constructor. The inhabitants of  form the following domain:



Remember that &perp; is added as least element to the set of values  and , we say that the type is lifted. A domain whose poset diagram consists of only one level is called a flat domain. We already know that $$Integer$$ is a flat domain as well, it's just that the level above &perp; has an infinite number of elements.

What are the possible inhabitants of ? They are

&perp;, Nothing, Just &perp;, Just True, Just False

So the general rule is to insert all possible values into the unary (binary, ternary, ...) constructors as usual but without forgetting &perp;. Concerning the partial order, we remember the condition that the constructors should be monotone just as any other functions. Hence, the partial order looks as follows



But there is something to ponder: why isn't ? I mean "Just undefined" is just as undefined as "undefined"! The answer is that this depends on whether the language is strict or non-strict. In a strict language, all constructors are strict by default, i.e.  and the diagram would reduce to



As a consequence, all domains of a strict language are flat.

But in a non-strict language like Haskell, constructors are non-strict by default and  is a new element different from &perp;, because we can write a function that reacts differently to them:

f (Just _) = 4 f Nothing = 7

As  ignores the contents of the   constructor,   is   but   is   (intuitively, if   is passed &perp;, it will not be possible to tell whether to take the Just branch or the Nothing branch, and so &perp; will be returned).

This gives rise to non-flat domains as depicted in the former graph. What should these be of use for? In the context of Graph Reduction, we may also think of &perp; as an unevaluated expression. Thus, a value  may tell us that a computation (say a lookup) succeeded and is not , but that the true value has not been evaluated yet. If we are only interested in whether  succeeded or not, this actually saves us from the unnecessary work to calculate whether   is   or   as would be the case in a flat domain. The full impact of non-flat domains will be explored in the chapter Laziness, but one prominent example are infinite lists treated in section Recursive Data Types and Infinite Lists.

Pattern Matching
In the section Strict Functions, we proved that some functions are strict by inspecting their results on different inputs and insisting on monotonicity. However, in the light of algebraic data types, there can only be one source of strictness in real life Haskell: pattern matching, i.e.  expressions. The general rule is that pattern matching on a constructor of a -type will force the function to be strict, i.e. matching &perp; against a constructor always gives &perp;. For illustration, consider

const1 _ = 1

const1' True = 1 const1' False = 1

The first function  is non-strict whereas the   is strict because it decides whether the argument is   or   although its result doesn't depend on that. Pattern matching in function arguments is equivalent to -expressions

const1' x = case x of   True  -> 1 False -> 1

which similarly impose strictness on : if the argument to the   expression denotes &perp; the whole   will denote &perp;, too. However, the argument for case expressions may be more involved as in

foo k table = case lookup ("Foo." ++ k) table of  Nothing -> ...   Just x  -> ...

and it can be difficult to track what this means for the strictness of.

An example for multiple pattern matches in the equational style is the logical :

or True _ = True or _ True = True or _ _   = False

Note that equations are matched from top to bottom. The first equation for  matches the first argument against , so   is strict in its first argument. The same equation also tells us that  is non-strict in. If the first argument is, then the second will be matched against   and   is strict in. Note that while wildcards are a general sign of non-strictness, this depends on their position with respect to the pattern matches against constructors.

There is another form of pattern matching, namely irrefutable patterns marked with a tilde. Their use is demonstrated by

f ~(Just x) = 1 f Nothing  = 2

An irrefutable pattern always succeeds (hence the name) resulting in. But when changing the definition of  to

f ~(Just x) = x + 1 f Nothing  = 2      -- this line may as well be left away

we have

f &perp;      = &perp; + 1 = &perp; f (Just 1) = 1 + 1 = 2

If the argument matches the pattern,  will be bound to the corresponding value. Otherwise, any variable like  will be bound to &perp;.

By default,  and   bindings are non-strict, too:

foo key map = let Just x = lookup key map in ...

is equivalent to

foo key map = case (lookup key map) of ~(Just x) -> ...

Recursive Data Types and Infinite Lists
The case of recursive data structures is not very different from the base case. Consider a list of unit values

data List = [] | : List

Though this seems like a simple type, there is a surprisingly complicated number of ways you can fit $$\bot$$ in here and there, and therefore the corresponding graph is complicated. The bottom of this graph is shown below. An ellipsis indicates that the graph continues along this direction. A red ellipse behind an element indicates that this is the end of a chain; the element is in normal form.



and so on. But now, there are also chains of infinite length like

This causes us some trouble as we noted in section Convergence that every monotone sequence must have a least upper bound. This is only possible if we allow for infinite lists. Infinite lists (sometimes also called streams) turn out to be very useful and their manifold use cases are treated in full detail in chapter Laziness. Here, we will show what their denotational semantics should be and how to reason about them. Note that while the following discussion is restricted to lists only, it easily generalizes to arbitrary recursive data structures like trees.

In the following, we will switch back to the standard list type

data [a] = [] | a : [a]

to close the syntactic gap to practical programming with infinite lists in Haskell.

Calculating with infinite lists is best shown by example. For that, we need an infinite list

ones :: [Integer] ones = 1 : ones

When applying the fixed point iteration to this recursive definition, we see that  ought to be the supremum of

,

that is an infinite list of. Let's try to understand what  should be. With the definition of

take 0 _     = [] take n (x:xs) = x : take (n-1) xs take n []    = []

we can apply  to elements of the approximating sequence of  :

take 2 &perp;      ==>  &perp; take 2 (1:&perp;)  ==>  1 : take 1 &perp;      ==>  1 : &perp; take 2 (1:1:&perp;) ==> 1 : take 1 (1:&perp;)  ==>  1 : 1 : take 0 &perp; ==> 1 : 1 : []

We see that  and so on must be the same as   because   is fully defined. Taking the supremum on both the sequence of input lists and the resulting sequence of output lists, we can conclude

take 2 ones = 1:1:[]

Thus, taking the first two elements of  behaves exactly as expected.

Generalizing from the example, we see that reasoning about infinite lists involves considering the approximating sequence and passing to the supremum, the truly infinite list. Still, we did not give it a firm ground. The solution is to identify the infinite list with the whole chain itself and to formally add it as a new element to our domain: the infinite list is the sequence of its approximations. Of course, any infinite list like  can be compactly depicted as

ones = 1 : 1 : 1 : 1 : ...

what simply means that

ones = (&perp; $$\sqsubseteq$$ 1:&perp; $$\sqsubseteq$$ 1:1:&perp; $$\sqsubseteq$$ ...)

What about the puzzle of how a computer can calculate with infinite lists? It takes an infinite amount of time, after all? Well, this is true. But the trick is that the computer may well finish in a finite amount of time if it only considers a finite part of the infinite list. So, infinite lists should be thought of as potentially infinite lists. In general, intermediate results take the form of infinite lists whereas the final value is finite. It is one of the benefits of denotational semantics that one can treat the intermediate infinite data structures as truly infinite when reasoning about program correctness.

As a last note, the construction of a recursive domain can be done by a fixed point iteration similar to recursive definition for functions. Yet, the problem of infinite chains has to be tackled explicitly. See the literature in External Links for a formal construction.

Haskell specialities: Strictness Annotations and Newtypes
Haskell offers a way to change the default non-strict behavior of data type constructors by strictness annotations. In a data declaration like

data Maybe' a = Just' !a | Nothing'

an exclamation point  before an argument of the constructor specifies that it should be strict in this argument. Hence we have  in our example. Further information may be found in chapter Strictness.

In some cases, one wants to rename a data type, like in

data Couldbe a = Couldbe (Maybe a)

However,  contains both the elements   and. With the help of a  definition

newtype Couldbe a = Couldbe (Maybe a)

we can arrange that  is semantically equal to , but different during type checking. In particular, the constructor  is strict. Yet, this definition is subtly different from

data Couldbe' a = Couldbe' !(Maybe a)

To explain how, consider the functions

f (Couldbe  m) = 42 f' (Couldbe' m) = 42

Here,  will cause the pattern match on the constructor   to fail with the effect that. But for the newtype, the match on  will never fail, we get. In a sense, the difference can be stated as: with the agreement that a pattern match on &perp; fails and that a match on  does not.
 * for the strict case,  is a synonym for &perp;
 * for the newtype, &perp; is a synonym for

Newtypes may also be used to define recursive types. An example is the alternate definition of the list type

newtype List a = In (Maybe (a, List a))

Again, the point is that the constructor  does not introduce an additional lifting with &perp;.

Here are a few more examples to differentiate between  and non-strict and strict   declarations (in the interactive prompt):

Prelude> data D = D Int Prelude> data SD = SD !Int Prelude> newtype NT = NT Int Prelude> (\(D _) -> 42) (D undefined) 42 Prelude> (\(SD _) -> 42) (SD undefined) *** Exception: Prelude.undefined [...] Prelude> (\(NT _) -> 42) (NT undefined) 42 Prelude> (\(D _) -> 42) undefined *** Exception: Prelude.undefined [...] Prelude> (\(SD _) -> 42) undefined *** Exception: Prelude.undefined [...] Prelude> (\(NT _) -> 42) undefined 42

Abstract Interpretation and Strictness Analysis
As lazy evaluation means a constant computational overhead, a Haskell compiler may want to discover where inherent non-strictness is not needed at all which allows it to drop the overhead at these particular places. To that extent, the compiler performs strictness analysis just like we proved in some functions to be strict section Strict Functions. Of course, details of strictness depending on the exact values of arguments like in our example  are out of scope (this is in general undecidable). But the compiler may try to find approximate strictness information and this works in many common cases like.

Now, abstract interpretation is a formidable idea to reason about strictness: ...

For more about strictness analysis, see the research papers about strictness analysis on the Haskell wiki.

Interpretation as Powersets
So far, we have introduced &perp; and the semantic approximation order $$\sqsubseteq$$ abstractly by specifying their properties. However, both as well as any inhabitants of a data type like  can be interpreted as ordinary sets. This is called the powerset construction. {| class="wikitable collapsible collapsed" style="text-align: left;margin:0px;" ! style="text-align: left;width:100em;" |

NOTE: i'm not sure whether this is really true. Someone who knows, please correct this.

 * Possible correction to my correction below, your notation for bottom isn't a powerset, its just a set which led to my correction. Having read more, I have to retract the correction but I suspect `undefined = { {}, {} }` in the first example and I'm not sure that has any meaning in terms of Haskell notation so I guess it would be correct to call it &perp;
 * Possible correction to my correction below, your notation for bottom isn't a powerset, its just a set which led to my correction. Having read more, I have to retract the correction but I suspect `undefined = { {}, {} }` in the first example and I'm not sure that has any meaning in terms of Haskell notation so I guess it would be correct to call it &perp;

Original correction: This isn't true, consider  which would lead to   which isn't true in Haskell:

{}  \    \  &perp; = {}

Furthermore:

{Just True}  {Just False} \    /           \   / {Nothing} {Just True, Just False} \     /       \    /  ⊥ = {Nothing, Just True, Just False}

excludes

but perhaps we'd be benefited if we could say

data powerset Writer a b = Writer a b

and get:

&forall;x y.Writer x y           /             \ /              \  &forall;x.Writer x &perp;        &forall;y.Writer &perp; y            \             / \          /              Writer &perp; &perp;

where  as the least defined occupant.
 * }

The idea is to think of &perp; as the set of all possible values and that a computation retrieves more information this by choosing a subset. In a sense, the denotation of a value starts its life as the set of all values which will be reduced by computations until there remains a set with a single element only.

As an example, consider  where the domain looks like

{True} {False} \     /     \    /    &perp; = {True, False}

The values  and   are encoded as the singleton sets   and   and &perp; is the set of all possible values.

Another example is :

{Just True}  {Just False} \    /           \   / {Nothing} {Just True, Just False} \     /       \    /  &perp; = {Nothing, Just True, Just False}

We see that the semantic approximation order is equivalent to set inclusion, but with arguments switched:

$$x\sqsubseteq y \iff x \supseteq y$$

This approach can be used to give a semantics to exceptions in Haskell.

Naïve Sets are unsuited for Recursive Data Types
In the section What to choose as Semantic Domain?, we argued that taking simple sets as denotation for types doesn't work well with partial functions. In the light of recursive data types, things become even worse as John C. Reynolds showed in his paper Polymorphism is not set-theoretic.

Reynolds actually considers the recursive type

newtype U = In ((U -> Bool) -> Bool)

Interpreting  as the set   and the function type   as the set of functions from   to , the type   cannot denote a set. This is because  is the set of subsets (powerset) of   which, due to a diagonal argument analogous to Cantor's argument that there are "more" real numbers than natural ones, always has a bigger cardinality than. Thus,  has an even bigger cardinality than   and there is no way for it to be isomorphic to. Hence, the set  must not exist, a contradiction.

In our world of partial functions, this argument fails. Here, an element of  is given by a sequence of approximations taken from the sequence of domains

and so on

where &perp; denotes the domain with the single inhabitant &perp;. While the author of this text admittedly has no clue on what such a thing should mean, the constructor gives a perfectly well defined object for. We see that the type  merely consists of shifted approximating sequences which means that it is isomorphic to.

As a last note, Reynolds actually constructs an equivalent of  in the second order polymorphic lambda calculus. There, it happens that all terms have a normal form, i.e. there are only total functions when we do not include a primitive recursion operator. Thus, there is no true need for partial functions and &perp;, yet a naïve set theoretic semantics fails. We can only speculate that this has to do with the fact that not every mathematical function is computable. In particular, the set of computable functions  should not have a bigger cardinality than.