Haskell/Category theory

This article attempts to give an overview of category theory, in so far as it applies to Haskell. To this end, Haskell code will be given alongside the mathematical definitions. Absolute rigour is not followed; in its place, we seek to give the reader an intuitive feel for what the concepts of category theory are and how they relate to Haskell.

Introduction to categories


A category is, in essence, a simple collection. It has three components:


 * A collection of objects.
 * A collection of morphisms, each of which ties two objects (a source object and a target object) together. (These are sometimes called arrows, but we avoid that term here as it has other connotations in Haskell.) If f is a morphism with source object C and target object B, we write $$f : C \to B$$.
 * A notion of composition of these morphisms. If $$g : A \to B$$ and $$f : B \to C$$ are two morphisms, they can be composed, resulting in a morphism $$f \circ g : A \to C$$.

Lots of things form categories. For example, Set is the category of all sets with morphisms as standard functions and composition being standard function composition. (Category names are often typeset in bold face.) Grp is the category of all groups with morphisms as functions that preserve group operations (the group homomorphisms), i.e. for any two groups, G with operation * and H with operation ·, a function $$k : G \to H$$ is a morphism in Grp if:


 * $$k(u * v) = k(u) \cdot k(v)$$

It may seem that morphisms are always functions, but this needn't be the case. For example, any partial order (P, $$\leq$$) defines a category where the objects are the elements of P, and there is a morphism between any two objects A and B iff $$A \leq B$$. Moreover, there are allowed to be multiple morphisms with the same source and target objects; using the Set example, $$\sin$$ and $$\cos$$ are both functions with source object $$\mathbb{R}$$ (the set of real numbers) and target object $$[-1,1]$$, but they’re most certainly not the same morphism!

Category laws
There are three laws that categories need to follow. Firstly, and most simply, the composition of morphisms needs to be associative. Symbolically,


 * $$f \circ (g \circ h) = (f \circ g) \circ h$$

Morphisms are applied right to left in Haskell and most commonly in mathematics, so with $$f \circ g$$ first g is applied, then f.

Secondly, the category needs to be closed under the composition operation. So if $$f : B \to C$$ and $$g : A \to B$$, then there must be some morphism $$h : A \to C$$ in the category such that $$h = f \circ g$$. We can see how this works using the following category:



f and g are both morphisms so we must be able to compose them and get another morphism in the category. So which is the morphism $$f \circ g$$? The only option is $$\operatorname{id}_A$$. Similarly, we see that $$g \circ f = \operatorname{id}_B$$.

Lastly, given a category C there needs to be for every object $$A$$ an identity morphism, $$\operatorname{id}_A : A \to A$$ that is an identity of composition with other morphisms. Put precisely, for every morphism $$g : A \to B$$:


 * $$g \circ \operatorname{id}_A = \operatorname{id}_B \circ g = g$$

Hask, the Haskell category
The main category we'll be concerning ourselves with in this article is Hask, which treats Haskell types as objects and Haskell functions as morphisms and uses  for composition: a function   for types   and   is a morphism in Hask. We can check the first and second law easily: we know  is an associative function, and clearly, for any   and ,   is another function. In Hask, the identity morphism is, and we have trivially:



This isn't an exact translation of the law above, though; we're missing subscripts. The function  in Haskell is polymorphic — it can take many different types for its domain and range, or, in category-speak, can have many different source and target objects. But morphisms in category theory are by definition monomorphic — each morphism has one specific source object and one specific target object (note: monomorphic here is not being used in the category theoretic sense). A polymorphic Haskell function can be made monomorphic by specifying its type (instantiating with a monomorphic type), so it would be more precise if we said that the identity morphism from Hask on a type  is. With this in mind, the above law would be rewritten as:



However, for simplicity, we will ignore this distinction when the meaning is clear.

Functors


So we have some categories which have objects and morphisms that relate our objects together. The next Big Topic in category theory is the functor, which relates categories together. A functor is essentially a transformation between categories, so given categories C and D, a functor $$F : C \to D$$:


 * Maps any object A in C to $$F(A)$$, in D.
 * Maps morphisms $$f : A \to B$$ in C to $$F(f) : F(A) \to F(B)$$ in D.

One of the canonical examples of a functor is the forgetful functor $$\mathbf{Grp} \to \mathbf{Set}$$ which maps groups to their underlying sets and group morphisms to the functions which behave the same but are defined on sets instead of groups. Another example is the power set functor $$\mathbf{Set} \to \mathbf{Set}$$ which maps sets to their power sets and maps functions $$f : X \to Y$$ to functions $$\mathcal{P}(X) \to \mathcal{P}(Y)$$ which take inputs $$U \subseteq X$$ and return $$f(U)$$, the image of U under f, defined by $$f(U) = \{ \, f(u) : u \in U \, \}$$. For any category C, we can define a functor known as the identity functor on C, or $$1_C : C \to C$$, that just maps objects to themselves and morphisms to themselves. This will turn out to be useful in the monad laws section later on.

Once again there are a few axioms that functors have to obey. Firstly, given an identity morphism $$\operatorname{id}_A$$ on an object A, $$F(\operatorname{id}_A)$$ must be the identity morphism on $$F(A)$$, i.e.:


 * $$F(\operatorname{id}_A) = \operatorname{id}_{F(A)}$$

Secondly functors must distribute over morphism composition, i.e.


 * $$F(f \circ g) = F(f) \circ F(g)$$

Functors on Hask
The Functor typeclass you have probably seen in Haskell does in fact tie in with the categorical notion of a functor. Remember that a functor has two parts: it maps objects in one category to objects in another and morphisms in the first category to morphisms in the second. Functors in Haskell are from Hask to func, where func is the subcategory of Hask defined on just that functor's types. E.g. the list functor goes from Hask to Lst, where Lst is the category containing only list types, that is,  for any type. The morphisms in Lst are functions defined on list types, that is, functions  for types ,. How does this tie into the Haskell typeclass Functor? Recall its definition:

class Functor (f :: * -> *) where fmap :: (a -> b) -> f a -> f b

Let's have a sample instance, too:

instance Functor Maybe where fmap f (Just x) = Just (f x)  fmap _ Nothing  = Nothing

Here's the key part: the type constructor Maybe takes any type  to a new type,. Also,  restricted to Maybe types takes a function   to a function. But that's it! We've defined two parts, something that takes objects in Hask to objects in another category (that of Maybe types and functions defined on Maybe types), and something that takes morphisms in Hask to morphisms in this category. So Maybe is a functor.

A useful intuition regarding Haskell functors is that they represent types that can be mapped over. This could be a list or a Maybe, but also more complicated structures like trees. A function that does some mapping could be written using, then any functor structure could be passed into this function. E.g. you could write a generic function that covers all of Data.List.map, Data.Map.map, Data.Array.IArray.amap, and so on.

What about the functor axioms? The polymorphic function  takes the place of $$\operatorname{id}_A$$ for any A, so the first law states:

fmap id = id

With our above intuition in mind, this states that mapping over a structure doing nothing to each element is equivalent to doing nothing overall. Secondly, morphism composition is just, so

fmap (f . g) = fmap f. fmap g

This second law is very useful in practice. Picturing the functor as a list or similar container, the right-hand side is a two-pass algorithm: we map over the structure, performing, then map over it again, performing. The functor axioms guarantee we can transform this into a single-pass algorithm that performs. This is a process known as fusion.

Translating categorical concepts into Haskell
Functors provide a good example of how category theory gets translated into Haskell. The key points to remember are that:


 * We work in the category Hask and its subcategories.
 * Objects are types.
 * Morphisms are functions.
 * Things that take a type and return another type are type constructors.
 * Things that take a function and return another function are higher-order functions.
 * Typeclasses, along with the polymorphism they provide, make a nice way of capturing the fact that in category theory things are often defined over a number of objects at once.

Monads


Monads are obviously an extremely important concept in Haskell, and in fact they originally came from category theory. A monad is a special type of functor, from a category to that same category, that supports some additional structure. So, down to definitions. A monad is a functor $$M : C \to C$$, along with two morphisms for every object X in C:


 * $$\operatorname{unit}^M_X : X \to M(X)$$
 * $$\operatorname{join}^M_X : M(M(X)) \to M(X)$$

When the monad under discussion is obvious, we’ll leave out the M superscript for these functions and just talk about $$\operatorname{unit}_X$$ and $$\operatorname{join}_X$$ for some X.

Let’s see how this translates to the Haskell typeclass Monad, then.

class Functor m => Monad m where return :: a -> m a  (>>=)  :: m a -> (a -> m b) -> m b

The class constraint of  ensures that we already have the functor structure: a mapping of objects and of morphisms. is the (polymorphic) analogue to $$\operatorname{unit}_X$$ for any X. But we have a problem. Although ’s type looks quite similar to that of unit; the other function, , often called bind, bears no resemblance to join. There is however another monad function,, that looks quite similar. Indeed, we can recover  and   from each other:

join :: Monad m => m (m a) -> m a join x = x >>= id (>>=) :: Monad m => m a -> (a -> m b) -> m b x >>= f = join (fmap f x)

So specifying a monad’s,  , and   is equivalent to specifying its   and. It just turns out that the normal way of defining a monad in category theory is to give unit and join, whereas Haskell programmers like to give  and. Often, the categorical way makes more sense. Any time you have some kind of structure $$M$$ and a natural way of taking any object $$X$$ into $$M(X)$$, as well as a way of taking $$M(M(X))$$ into $$M(X)$$, you probably have a monad. We can see this in the following example section.

Example: the powerset functor is also a monad
The power set functor $$P : \mathbf{Set} \to \mathbf{Set}$$ described above forms a monad. For any set S you have a $$\operatorname{unit}_S(x) = \{x\}$$, mapping elements to their singleton set. Note that each of these singleton sets are trivially a subset of S, so $$\operatorname{unit}_S$$ returns elements of the powerset of S, as is required. Also, you can define a function $$\operatorname{join}_S$$ as follows: we receive an input $$L \in \mathcal{P}(\mathcal{P}(S))$$. This is:


 * A member of the powerset of the powerset of S.
 * So a member of the set of all subsets of the set of all subsets of S.
 * So a set of subsets of S

We then return the union of these subsets, giving another subset of S. Symbolically,


 * $$\operatorname{join}_S(L) = \bigcup L$$.

Hence P is a monad.

In fact, P is almost equivalent to the list monad; with the exception that we're talking lists instead of sets, they're almost the same. Compare:

The monad laws and their importance
Just as functors had to obey certain axioms in order to be called functors, monads have a few of their own. We'll first list them, then translate to Haskell, then see why they’re important.

Given a monad $$M : C \to C$$ and a morphism $$f : A \to B$$ for $$A, B \in C$$,
 * 1) $$\mathrm{join} \circ M(\mathrm{join}) = \mathrm{join} \circ \mathrm{join}$$
 * 2) $$\mathrm{join} \circ M(\mathrm{unit}) = \mathrm{join} \circ \mathrm{unit} = \mathrm{id}$$
 * 3) $$\mathrm{unit} \circ f = M(f) \circ \mathrm{unit}$$
 * 4) $$\mathrm{join} \circ M(M(f)) = M(f) \circ \mathrm{join}$$

By now, the Haskell translations should be hopefully self-explanatory:



(Remember that  is the part of a functor that acts on morphisms.) These laws seem a bit impenetrable at first, though. What on earth do these laws mean, and why should they be true for monads? Let’s explore the laws.

The first law


In order to understand this law, we'll first use the example of lists. The first law mentions two functions,  (the left-hand side) and   (the right-hand side). What will the types of these functions be? Remembering that ’s type is   (we’re talking just about lists for now), the types are both   (the fact that they’re the same is handy; after all, we’re trying to show they’re completely the same function!). So we have a list of lists of lists. The left-hand side, then, performs  on this 3-layered list, then uses   on the result. is just the familiar  for lists, so we first map across each of the list of lists inside the top-level list, concatenating them down into a list each. So afterward, we have a list of lists, which we then run through. In summary, we 'enter' the top level, collapse the second and third levels down, then collapse this new level with the top level.

What about the right-hand side? We first run  on our list of list of lists. Although this is three layers, and you normally apply a two-layered list to, this will still work, because a   is just  , where  , so in a sense, a three-layered list is just a two layered list, but rather than the last layer being 'flat', it is composed of another list. So if we apply our list of lists (of lists) to, it will flatten those outer two layers into one. As the second layer wasn't flat but instead contained a third layer, we will still end up with a list of lists, which the other  flattens. Summing up, the left-hand side will flatten the inner two layers into a new layer, then flatten this with the outermost layer. The right-hand side will flatten the outer two layers, then flatten this with the innermost layer. These two operations should be equivalent. It’s sort of like a law of associativity for.

is also a monad, with

return :: a -> Maybe a return x = Just x join :: Maybe (Maybe a) -> Maybe a join Nothing        = Nothing join (Just Nothing) = Nothing join (Just (Just x)) = Just x

So if we had a three-layered Maybe (i.e., it could be,  ,   or  ), the first law says that collapsing the inner two layers first, then that with the outer layer is exactly the same as collapsing the outer layers first, then that with the innermost layer.

The second law
What about the second law, then? Again, we'll start with the example of lists. Both functions mentioned in the second law are functions. The left-hand side expresses a function that maps over the list, turning each element  into its singleton list , so that at the end we’re left with a list of singleton lists. This two-layered list is flattened down into a single-layer list again using the. The right hand side, however, takes the entire list, turns it into the singleton list of lists  , then flattens the two layers down into one again. This law is less obvious to state quickly, but it essentially says that applying  to a monadic value, then  ing the result should have the same effect whether you perform the   from inside the top layer or from outside it.

The third and fourth laws
The last two laws express more self-evident facts about how we expect monads to behave. The easiest way to see how they are true is to expand them to use the expanded form:



Application to do-blocks
Well, we have intuitive statements about the laws that a monad must support, but why is that important? The answer becomes obvious when we consider do-blocks. Recall that a do-block is just syntactic sugar for a combination of statements involving  as witnessed by the usual translation:

do { x }                -->  x do { let { y = v }; x }  -->  let y = v in do { x } do { v <- y; x }        -->  y >>= \v -> do { x } do { y; x }             -->  y >>= \_ -> do { x }

Also notice that we can prove what are normally quoted as the monad laws using  and   from our above laws (the proofs are a little heavy in some cases, feel free to skip them if you want to):



. Proof: return x >>= f = join (fmap f (return x)) -- By the definition of (>>=) = join (return (f x))     -- By law 3 = (join . return) (f x) = id (f x)                -- By law 2 = f x



. Proof: m >>= return = join (fmap return m)   -- By the definition of (>>=) = (join . fmap return) m = id m                   -- By law 2 = m



. Proof (recall that ): (m >>= f) >>= g = (join (fmap f m)) >>= g                         -- By the definition of (>>=) = join (fmap g (join (fmap f m)))                 -- By the definition of (>>=) = (join . fmap g) (join (fmap f m)) = (join . fmap g . join) (fmap f m) = (join . join . fmap (fmap g)) (fmap f m)        -- By law 4 = (join . join . fmap (fmap g) . fmap f) m = (join . join . fmap (fmap g. f)) m             -- By the distributive law of functors = (join . join . fmap (\x -> fmap g (f x))) m = (join . fmap join . fmap (\x -> fmap g (f x))) m -- By law 1 = (join . fmap (join. (\x -> fmap g (f x)))) m   -- By the distributive law of functors = (join . fmap (\x -> join (fmap g (f x)))) m = (join . fmap (\x -> f x >>= g)) m               -- By the definition of (>>=) = join (fmap (\x -> f x >>= g) m) = m >>= (\x -> f x >>= g)                         -- By the definition of (>>=)



These new monad laws, using  and , can be translated into do-block notation.

The monad laws are now common-sense statements about how do-blocks should function. If one of these laws were invalidated, users would become confused, as you couldn't be able to manipulate things within the do-blocks as would be expected. The monad laws are, in essence, usability guidelines.

Summary
We've come a long way in this chapter. We've looked at what categories are and how they apply to Haskell. We've introduced the basic concepts of category theory including functors, as well as some more advanced topics like monads, and seen how they're crucial to idiomatic Haskell. We haven't covered some of the basic category theory that wasn't needed for our aims, like natural transformations, but have instead provided an intuitive feel for the categorical grounding behind Haskell's structures.