Haskell/Type declarations

You're not restricted to working with just the types provided by default with the language. There are many benefits to defining your own types:


 * Code can be written in terms of the problem being solved, making programs easier to design, write and understand.
 * Related pieces of data can be brought together in ways more convenient and meaningful than simply putting and getting values from lists or tuples.
 * Pattern matching and the type system can be used to their fullest extent by making them work with your custom types.

Haskell has three basic ways to declare a new type:


 * The data declaration, which defines new data types.
 * The type declaration for type synonyms, that is, alternative names for existing types.
 * The newtype declaration, which defines new data types equivalent to existing ones.

In this chapter, we will study data and type. In a later chapter, we will discuss newtype and see where it can be useful.

and constructor functions
data is used to define new data types mostly using existing ones as building blocks. Here's a data structure for elements in a simple list of anniversaries:

This declares a new data type Anniversary, which can be either a Birthday or a Wedding. A Birthday contains one string and three integers and a Wedding contains two strings and three integers. The definitions of the two possibilities are separated by the vertical bar. The comments explain to readers of the code about the intended use of these new types. Moreover, with the declaration we also get two constructor functions for <tt>Anniversary</tt>; appropriately enough, they are called <tt>Birthday</tt> and <tt>Wedding</tt>. These functions provide a way to build a new <tt>Anniversary</tt>.

Types defined by <tt>data</tt> declarations are often referred to as algebraic data types, which is something we will address further in later chapters.

As usual with Haskell, the case of the first letter is important: type names and constructor functions must start with capital letters. Other than this syntactic detail, constructor functions work pretty much like the "conventional" functions we have met so far. In fact, if you use <tt>:t</tt> in GHCi to query the type of, say, <tt>Birthday</tt>, you'll get:

Meaning it's just a function which takes one String and three Int as arguments and evaluates to an <tt>Anniversary</tt>. This anniversary will contain the four arguments we passed as specified by the <tt>Birthday</tt> constructor.

Calling constructors is no different from calling other functions. For example, suppose we have John Smith born on 3rd July 1968:

He married Jane Smith on 4th March 1987:

These two anniversaries can, for instance, be put in a list:

Or you could just as easily have called the constructors straight away when building the list (although the resulting code looks a bit cluttered).

Deconstructing types
To use our new data types, we must have a way to access their contents. For instance, one very basic operation with the anniversaries defined above would be extracting the names and dates they contain as a String. So we need a <tt>showAnniversary</tt> function (for the sake of code clarity, we used an auxiliary <tt>showDate</tt> function but let's ignore it for a moment):

This example shows how we can deconstruct the values built in our data types. <tt>showAnniversary</tt> takes a single argument of type <tt>Anniversary</tt>. Instead of just providing a name for the argument on the left side of the definition, however, we specify one of the constructor functions and give names to each argument of the constructor (which correspond to the contents of the Anniversary). A more formal way of describing this "giving names" process is to say we are binding variables. "Binding" is being used in the sense of assigning a variable to each of the values so that we can refer to them on the right side of the function definition.

To handle both "Birthday" and "Wedding" Anniversaries, we needed to provide two function definitions, one for each constructor. When <tt>showAnniversary</tt> is called, if the argument is a <tt>Birthday</tt> Anniversary, the first definition is used and the variables <tt>name</tt>, <tt>year</tt>, <tt>month</tt> and <tt>day</tt> are bound to its contents. If the argument is a <tt>Wedding</tt> Anniversary, then the second definition is used and the variables are bound in the same way. This process of using a different version of the function depending on the type of constructor is pretty much like what happens when we use a <tt>case</tt> statement or define a function piece-wise.

Note that the parentheses around the constructor name and the bound variables are mandatory; otherwise the compiler or interpreter would not take them as a single argument. Also, it is important to have it absolutely clear that the expression inside the parentheses is not a call to the constructor function, even though it may look just like one.

for making type synonyms
As mentioned in the introduction of this module, code clarity is one of the motivations for using custom types. In that spirit, it could be nice to make it clear that the Strings in the Anniversary type are being used as names while still being able to manipulate them like ordinary Strings. This calls for a <tt>type</tt> declaration:

The code above says that a <tt>Name</tt> is now a synonym for a <tt>String</tt>. Any function that takes a <tt>String</tt> will now take a <tt>Name</tt> as well (and vice-versa: functions that take <tt>Name</tt> will accept any <tt>String</tt>). The right hand side of a <tt>type</tt> declaration can be a more complex type as well. For example, <tt>String</tt> itself is defined in the standard libraries as

We can do something similar for the list of anniversaries we made use of:

Type synonyms are mostly just a convenience. They help make the roles of types clearer or provide an alias to such things as complicated list or tuple types. It is largely a matter of personal discretion to decide how type synonyms should be deployed. Abuse of synonyms could make code confusing (for instance, picture a long program using multiple names for common types like Int or String simultaneously).

Incorporating the suggested type synonyms and the <tt>Date</tt> type we proposed in the exercise(*) of the previous section the code we've written so far looks like this: ((*) last chance to try that exercise without looking at the spoilers.)

Even in a simple example like this one, there is a noticeable gain in simplicity and clarity compared to the same task using only Ints, Strings, and corresponding lists.

Note that the <tt>Date</tt> type has a constructor function which is called <tt>Date</tt> as well. That is perfectly valid and indeed giving the constructor the same name of the type when there is just one constructor is good practice, as a simple way of making the role of the function obvious.