Haskell/GADT

Generalized algebraic datatypes, or simply GADTs, are a generalization of the algebraic data types that you are familiar with. Basically, they allow you to explicitly write down the types of the constructors. In this chapter, you'll learn why this is useful and how to declare your own.

We begin with an example of building a simple embedded domain specific language (EDSL) for simple arithmetical expressions, which is put on a sounder footing with GADTs. This is followed by a review of the syntax for GADTs, with simpler illustrations, and a different application to construct a safe list type for which the equivalent of  fails to typecheck and thus does not give the usual runtime error:.

Understanding GADTs
So, what are GADTs and what are they useful for? GADTs are mainly used to implement domain specific languages, and so this section will introduce them with a corresponding example.

Arithmetic expressions
Let's consider a small language for arithmetic expressions, given by the data type

In other words, this data type corresponds to the abstract syntax tree, an arithmetic term like  would be represented as.

Given the abstract syntax tree, we would like to do something with it; we want to compile it, optimize it and so on. For starters, let's write an evaluation function that takes an expression and calculates the integer value it represents. The definition is straightforward:

Extending the language
Now, imagine that we would like to extend our language with other types than just integers. For instance, let's say we want to represent equality tests, so we need booleans as well. We augment the `Expr` type to become

The term  would be represented as.

As before, we want to write a function  to evaluate expressions. But this time, expressions can now represent either integers or booleans and we have to capture that in the return type

The first two cases are straightforward

but now we get in trouble. We would like to write

but this doesn't type check: the addition function  expects two integer arguments, but   is of type   and we'd have to extract the   from that.

Even worse, what happens if  actually represents a boolean? The following is a valid expression

but clearly, it doesn't make any sense; we can't add booleans to integers! In other words, evaluation may return integers or booleans, but it may also fail because the expression makes no sense. We have to incorporate that in the return type:

Now, we could write this function just fine, but that would still be unsatisfactory, because what we really want to do is to have Haskell's type system rule out any invalid expressions; we don't want to check types ourselves while deconstructing the abstract syntax tree.

Exercise: Despite our goal, it may still be instructional to implement the  function; do this.

Starting point:

Phantom types
The so-called phantom types are the first step towards our goal. The technique is to augment the  with a type variable, so that it becomes

Note that an expression  does not contain a value   at all; that's why   is called a phantom type, it's just a dummy variable. Compare that with, say, a list  which does contain a bunch of  s.

The key idea is that we're going to use  to track the type of the expression for us. Instead of making the constructor

available to users of our small language, we are only going to provide a smart constructor with a more restricted type

The implementation is the same, but the types are different. Doing this with the other constructors as well,

the previously problematic expression

no longer type checks! After all, the first argument has the type  while   expects an. In other words, the phantom type  marks the intended type of the expression. By only exporting the smart constructors, the user cannot create expressions with incorrect types.

As before, we want to implement an evaluation function. With our new marker, we might hope to give it the type

and implement the first case like this

But alas, this does not work: how would the compiler know that encountering the constructor  means that  ? Granted, this will be the case for all expressions that were created by users of our language because they are only allowed to use the smart constructors. But internally, an expression like

is still valid. In fact, as you can see,  doesn't even have to be   or , it could be anything.

What we need is a way to restrict the return types of the constructors themselves, and that's exactly what generalized data types do.

GADTs
To enable this language feature, add  to the beginning of the file.

The obvious notation for restricting the type of a constructor is to write down its type, and that's exactly how GADTs are defined:

In other words, we simply list the type signatures of all the constructors. In particular, the marker type  is specialised to   or   according to our needs, just like we would have done with smart constructors.

And the great thing about GADTs is that we now can implement an evaluation function that takes advantage of the type marker:

In particular, in the first case

the compiler is now able to infer that  when we encounter a constructor   and that it is legal to return the  ; similarly for the other cases.

To summarise, GADTs allows us to restrict the return types of constructors and thus enable us to take advantage of Haskell's type system for our domain specific languages. Thus, we can implement more languages and their implementation becomes simpler.

Syntax
Here's a quick summary of how the syntax for declaring GADTs works.

First, consider the following ordinary algebraic datatypes: the familiar  and   types, and a simple tree type,  :

Remember that the constructors introduced by these declarations can be used both for pattern matches to deconstruct values and as functions to construct values. ( and   are functions with "zero arguments".) We can ask what the types of the latter are:

It is clear that this type information about the constructors for,   and   respectively is equivalent to the information we gave to the compiler when declaring the datatype in the first place. In other words, it's also conceivable to declare a datatype by simply listing the types of all of its constructors, and that's exactly what the GADT syntax does:

This syntax is made available by the language option. It should be familiar to you in that it closely resembles the syntax of type class declarations. It's also easy to remember if you already like to think of constructors as just being functions. Each constructor is just defined by a type signature.

New possibilities
Note that when we asked the  for the types of   and   it returned   and   as the types. In these and the other cases, the type of the final output of the function associated with a constructor is the type we were initially defining -,   or. In general, in standard Haskell, the constructor functions for  have   as their final return type. If the new syntax were to be strictly equivalent to the old, we would have to place this restriction on its use for valid type declarations.

So what do GADTs add for us? The ability to control exactly what kind of  you return. With GADTs, a constructor for  is not obliged to return  ; it can return any   that you can think of. In the code sample below, for instance, the  constructor returns a   even though it is for the type.

But note that you can only push the generalization so far... if the datatype you are declaring is a, the constructor functions must return some kind of   or another. Returning anything else simply wouldn't work.

Safe Lists

 * Prerequisite: We assume in this section that you know how a List tends to be represented in functional languages
 * Note: The examples in this section additionally require the extensions EmptyDataDecls and KindSignatures to be enabled

We've now gotten a glimpse of the extra control given to us by the GADT syntax. The only thing new is that you can control exactly what kind of data structure you return. Now, what can we use it for? Consider the humble Haskell list. What happens when you invoke ? Haskell blows up. Have you ever wished you could have a magical version of  that only accepts lists with at least one element, lists on which it will never blow up?

To begin with, let's define a new type,. The idea is to have something similar to normal Haskell lists, but with a little extra information in the type. This extra information (the type variable ) tells us whether or not the list is empty. Empty lists are represented as, whereas non-empty lists are represented as.

Since we have this extra information, we can now define a function  on only the non-empty lists! Calling  on an empty list would simply refuse to type-check.

So now that we know what we want,, how do we actually go about getting it? The answer is GADT. The key is that we take advantage of the GADT feature to return two different list-of-a types,  for the   constructor, and   for the   constructor:

This wouldn't have been possible without GADT, because all of our constructors would have been required to return the same type of list; whereas with GADT we can now return different types of lists with different constructors. Anyway, let's put this all together, along with the actual definition of :

Copy this listing into a file and load in. You should notice the following difference, calling  on a non-empty and an empty-list respectively:

The complaint is a good thing: it means that we can now ensure during compile-time if we're calling  on an appropriate list. However, that also sets up a pitfall in potential. Consider the following function. What do you think its type is?

Now try loading the example up in GHCi. You'll notice the following complaint:

The cases in the definition of  evaluate to marked lists of different types, leading to a type error. The extra constraints imposed through the GADT make it impossible for a function to produce both empty and non-empty lists.

If we are really keen on defining, we can do so by liberalizing the type of  , so that it can construct both safe and unsafe lists.

There is a cost to the fix above: by relaxing the constraint on  we throw away the knowledge that it cannot produce an empty list. Based on our first version of the safe list we could, for instance, define a function which took a  argument and be sure anything produced by   would not be accepted by it. That does not hold for the analogous ; arguably, the type is less useful exactly because it is less restrictive. While in this example the issue may seem minor, given that not much can be done with an empty list, in general it is worth considering.

A simple expression evaluator

 * Insert the example used in Wobbly Types paper... I thought that was quite pedagogical
 * This is already covered in the first part of the tutorial.

Discussion

 * More examples, thoughts


 * From FOSDEM 2006, I vaguely recall that there is some relationship between GADT and the below... what?

Phantom types
See ../Phantom types/.

Existential types
If you like ../Existentially quantified types/, you'd probably want to notice that they are now subsumed by GADTs. As the GHC manual says, the following two type declarations give you the same thing.