Yet Another Haskell Tutorial/Language advanced

Sections and Infix Operators
We've already seen how to double the values of elements in a list using :

However, there is a more concise way to write this:

This type of thing can be done for any infix function:

You might be tempted to try to subtract values from elements in a list by mapping  across a list. This won't work, though, because while the  in   is parsed as the standard plus operator (as there is no ambiguity), the  in   is interpreted as the unary minus, not the binary minus. Thus  here is the number $$-2$$, not the function $$\lambda x. x-2$$.

In general, these are called sections. For binary infix operators (like ), we can cause the function to become prefix by enclosing it in parentheses. For example:

Additionally, we can provide either of its argument to make a section. For example:

Non-infix functions can be made infix by enclosing them in backquotes ("\`"). For example:

Local Declarations
Recall back from the section on Functions, there are many computations which require using the result of the same computation in multiple places in a function. There, we considered the function for computing the roots of a quadratic polynomial:

roots a b c = ((-b + sqrt(b*b - 4*a*c)) / (2*a),    (-b - sqrt(b*b - 4*a*c)) / (2*a)) In addition to the let bindings introduced there, we can do this using a where clause. clauses come immediately after function definitions and introduce a new level of layout (see the section on Layout). We write this as:

roots a b c = ((-b + det) / (2*a), (-b - det) / (2*a)) where det = sqrt(b*b-4*a*c)

Any values defined in a where clause shadow any other values with the same name. For instance, if we had the following code block:

det = "Hello World"

roots a b c = ((-b + det) / (2*a), (-b - det) / (2*a)) where det = sqrt(b*b-4*a*c)

f _ = det The value of  doesn't notice the top-level declaration of, since it is shadowed by the local definition (the fact that the types don't match doesn't matter either). Furthermore, since cannot "see inside" of, the only thing it knows about  is what is available at the top level, which is the string "Hello World."

We could also pull out the computation and get the following code:

roots a b c = ((-b + det) / (a2), (-b - det) / (a2)) where det = sqrt(b*b-4*a*c) a2 = 2*a Sub-expressions in where clauses must come after function definitions. Sometimes it is more convenient to put the local definitions before the actual expression of the function. This can be done by using let/in clauses. We have already seen let clauses; where clauses are virtually identical to their <tt>let</tt> clause cousins except for their placement. The same function can be written using <tt>let</tt> as:

roots a b c = let det = sqrt (b*b - 4*a*c) a2 = 2*a in ((-b + det) / a2, (-b - det) / a2) Using a <tt>where</tt> clause, it looks like:

roots a b c = ((-b + det) / a2, (-b - det) / a2) where det = sqrt (b*b - 4*a*c) a2 = 2*a These two types of clauses can be mixed (i.e., you can write a function which has both a <tt>let</tt> cause and a <tt>where</tt> clause). This is strongly advised against, as it tends to make code difficult to read. However, if you choose to do it, values in the <tt>let</tt> clause shadow those in the <tt>where</tt> clause. So if you define the function:

f x = let y = x+1 in y    where y = x+2 The value of  is , not. Of course, I plead with you to never ever write code that looks like this. No one should have to remember this rule and by shadowing <tt>where</tt>-defined values in a <tt>let</tt> clause only makes your code difficult to understand.

In general, whether you should use <tt>let</tt> clauses or <tt>where</tt> clauses is largely a matter of personal preference. Usually, the names you give to the subexpressions should be sufficiently expressive that without reading their definitions any reader of your code should be able to figure out what they do. In this case, <tt>where</tt> clauses are probably more desirable because they allow the reader to see immediately what a function does. However, in real life, values are often given cryptic names.

Partial Application
Partial application is when you take a function which takes $$n$$ arguments and you supply it with $$<n$$ of them. When discussing Sections, we saw a form of "partial application" in which functions like  were partially applied. For instance, in the expression, the section  is a partial application of. This is because really takes two arguments, but we've only given it one.

Partial application is very common in function definitions and sometimes goes by the name "η (eta) reduction". For instance, suppose we are writing a function  which converts a whole string into lower case. We could write this as:

lcaseString s = map toLower s Here, there is no partial application (though you could argue that applying no arguments to  could be considered partial application). However, we notice that the application of occurs at the end of both   and of. In fact, we can remove it by performing eta reduction, to get:

lcaseString = map toLower Now, we have a partial application of : it expects a function and a list, but we've only given it the function.

This all is related to the type of, which is  , when all parentheses are included. In our case, is of type. Thus, if we supply this function to, we get a function of type  , as desired.

Now, consider the task of converting a string to lowercase and removing all non letter characters. We might write this as:

lcaseLetters s = map toLower (filter isAlpha s) But note that we can actually write this in terms of function composition:

lcaseLetters s = (map toLower . filter isAlpha) s And again, we're left with an eta reducible function:

lcaseLetters = map toLower. filter isAlpha Writing functions in this style is very common among advanced Haskell users. In fact it has a name: point-free programming (not to be confused with pointless programming). It is called point free because in the original definition of, we can think of the value  as a point on which the function is operating. By removing the point from the function definition, we have a point-free function.

A function similar to  is .  Whereas   is function composition,  is function application. The definition of  from the Prelude is very simple:

f $ x = f x

However, this function is given very low fixity, which means that it can be used to replace parentheses. For instance, we might write a function:

foo x y = bar y (baz (fluff (ork x))) However, using the function application function, we can rewrite this as:

foo x y = bar y $ baz $ fluff $ ork x

This moderately resembles the function composition syntax. The function is also useful when combined with other infix functions. For instance, we cannot write:

because this is interpreted as , which makes no sense. However, we can fix this by writing instead:

Which works fine.

Consider now the task of extracting from a list of tuples all the ones whose first component is greater than zero. One way to write this would be:

fstGt0 l = filter (\ (a,b) -> a>0) l We can first apply eta reduction to the whole function, yielding:

fstGt0 = filter (\ (a,b) -> a>0) Now, we can rewrite the lambda function to use the  function instead of the pattern matching:

fstGt0 = filter (\x -> fst x > 0) Now, we can use function composition between  and   to get:

fstGt0 = filter (\x -> ((>0). fst) x) And finally we can eta reduce:

fstGt0 = filter ((>0).fst) This definition is simultaneously shorter and easier to understand than the original. We can clearly see exactly what it is doing: we're filtering a list by checking whether something is greater than zero. What are we checking? The  element.

While converting to point free style often results in clearer code, this is of course not always the case. For instance, converting the following map to point free style yields something nearly uninterpretable:

foo = map (\x -> sqrt (3+4*(x^2))) foo = map (sqrt . (3+) . (4*) . (^2)) There are a handful of combinators defined in the Prelude which are useful for point free programming:


 * takes a function of type  and converts it into a function of type  .  This is useful, for example, when mapping across a list of pairs:


 * is the opposite of  and takes a function of type   and produces a function of type.


 * reverse the order of the first two arguments to a function. That is, it takes a function of type   and produces a function of type  .  For instance, we can sort a list in reverse order by using  :

This is the same as saying:

only shorter.

Of course, not all functions can be written in point free style. For instance:

square x = x*x Cannot be written in point free style, without some other combinators. For instance, if we can define other functions, we can write:

pair x = (x,x) square = uncurry (*). pair But in this case, this is not terribly useful.

Pattern Matching
Pattern matching is one of the most powerful features of Haskell (and most functional programming languages). It is most commonly used in conjunction with <tt>case</tt> expressions, which we have already seen in the section on Functions. Let's return to our example from the section on Datatypes. I'll repeat the definition we already had for the datatype:

data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black | Custom Int Int Int -- R G B components deriving (Show,Eq) We then want to write a function that will convert between something of type  and a triple of  s, which correspond to the RGB values, respectively. Specifically, if we see a  which is, we want to return  , since this is the RGB value for red. So we write that (remember that piecewise function definitions are just <tt>case</tt> statements):

colorToRGB Red = (255,0,0) If we see a  which is , we want to return ; and if we see, we want to return , and so on. Finally, if we see a custom color, which is comprised of three components, we want to make a triple out of these, so we write:

colorToRGB Orange = (255,128,0) colorToRGB Yellow = (255,255,0) colorToRGB Green = (0,255,0) colorToRGB Blue  = (0,0,255) colorToRGB Purple = (255,0,255) colorToRGB White = (255,255,255) colorToRGB Black = (0,0,0) colorToRGB (Custom r g b) = (r,g,b) Then, in our interpreter, if we type:

What is happening is this: we create a value, call it $$x$$, which has value. We then apply this to. We check to see if we can "match" $$x$$ against. This match fails because according to the definition of {Color},   is not equal to. We continue down the definitions of and try to match   against. This fails, too. We the try to match  against  , which succeeds, so we use this function definition, which simply returns the value, as expected.

Suppose instead, we used a custom color:

We apply the same matching process, failing on all values from to. We then get to try to match  against. We can see that the  part matches, so then we go see if the subelements match. In the matching, the variables,   and   are essentially wild cards, so there is no trouble matching   with 50,   with 200 and with 100. As a "side-effect" of this matching,  gets the value 50,  gets the value 200 and   gets the value 100. So the entire match succeeded and we look at the definition of this part of the function and bundle up the triple using the matched values of,   and.

We can also write a function to check to see if a  is a custom color or not:

isCustomColor (Custom _ _ _) = True isCustomColor _ = False

When we apply a value to  it tries to match that value against. This match will succeed if the value is  for any ,   and. The (underscore) character is a "wildcard" and will match anything, but will not do the binding that would happen if you put a variable name there. If this match succeeds, the function returns ; however, if this match fails, it goes on to the next line, which will match anything and then return.

For some reason we might want to define a function which tells us whether a given color is "bright" or not, where my definition of "bright" is that one of its RGB components is equal to 255 (admittedly an arbitrary definition, but it's simply an example). We could define this function as:

isBright = isBright'. colorToRGB where isBright' (255,_,_) = True isBright' (_,255,_) = True isBright' (_,_,255) = True isBright' _        = False Let's dwell on this definition for a second. The function is the composition of our previously defined function and a helper function, which tells us if a given RGB value is bright or not. We could replace the first line here with  but there is no need to explicitly write the parameter here, so we don't.  Again, this function composition style of programming takes some getting used to, so I will try to use it frequently in this tutorial.

The  helper function takes the RGB triple produced by .  It first tries to match it against which succeeds if the value has 255 in its first position. If this match succeeds,  returns   and so does . The second and third line of definition check for 255 in the second and third position in the triple, respectively. The fourth line, the fallthrough, matches everything else and reports it as not bright.

We might want to also write a function to convert between RGB triples and s.  We could simply stick everything in a  constructor, but this would defeat the purpose; we want to use the slot only for values which don't match the predefined colors. However, we don't want to allow the user to construct custom colors like (600,-40,99) since these are invalid RGB values. We could throw an error if such a value is given, but this can be difficult to deal with. Instead, we use the  datatype. This is defined (in the Prelude) as:

data Maybe a = Nothing | Just a The way we use this is as follows: our  function returns a value of type. If the RGB value passed to our function is invalid, we return, which corresponds to a failure. If, on the other hand, the RGB value is valid, we create the appropriate  value and return that. The code to do this is:

rgbToColor 255  0   0 = Just Red rgbToColor 255 128  0 = Just Orange rgbToColor 255 255  0 = Just Yellow rgbToColor  0 255   0 = Just Green rgbToColor  0   0 255 = Just Blue rgbToColor 255  0 255 = Just Purple rgbToColor 255 255 255 = Just White rgbToColor  0   0   0 = Just Black rgbToColor  r   g   b = if 0 <= r && r <= 255 && 0 <= g && g <= 255 && 0 <= b && b <= 255 then Just (Custom r g b)     else Nothing   -- invalid RGB value The first eight lines match the RGB arguments against the predefined values and, if they match,  returns   the appropriate color. If none of these matches, the last definition of matches the first argument against , the second against  and the third against   (which causes the side-effect of binding these values). It then checks to see if these values are valid (each is greater than or equal to zero and less than or equal to 255). If so, it returns ; if not, it returns  corresponding to an invalid color.

Using this, we can write a function that checks to see if a right RGB value is valid:

rgbIsValid r g b = rgbIsValid' (rgbToColor r g b)   where rgbIsValid' (Just _) = True rgbIsValid' _       = False Here, we compose the helper function  with our function. The helper function checks to see if the value returned by  is   anything (the wildcard). If so, it returns. If not, it matches anything and returns.

Pattern matching isn't magic, though. You can only match against datatypes; you cannot match against functions. For instance, the following is invalid:

f x = x + 1

g (f x) = x Even though the intended meaning of  is clear (i.e.,  ), the compiler doesn't know in general that   has an inverse function, so it can't perform matches like this.

Guards
Guards can be thought of as an extension to the pattern matching facility. They enable you to allow piecewise function definitions to be taken according to arbitrary boolean expressions. Guards appear after all arguments to a function but before the equals sign, and are begun with a vertical bar. We could use guards to write a simple function which returns a string telling you the result of comparing two elements:

comparison x y | x < y = "The first is less" | x > y = "The second is less" | otherwise = "They are equal" You can read the vertical bar as "such that." So we say that the value of  "such that" x is less than y is "The first is less." The value such that x is greater than y is "The second is less" and the value <tt>otherwise</tt> is "They are equal". The keyword  is simply defined to be equal to  and thus matches anything that falls through that far. So, we can see that this works:

Guards are applied in conjunction with pattern matching. When a pattern matches, all of its guards are tried, consecutively, until one matches. If none match, then pattern matching continues with the next pattern.

One nicety about guards is that <tt>where</tt> clauses are common to all guards. So another possible definition for our  function from the previous section would be:

isBright2 c | r == 255 = True | g == 255 = True | b == 255 = True | otherwise = False where (r,g,b) = colorToRGB c The function is equivalent to the previous version, but performs its calculation slightly differently. It takes a color,, and applies  to it, yielding an RGB triple which is matched (using pattern matching!) against. This match succeeds and the values,   and   are bound to their respective values. The first guard checks to see if  is 255 and, if so, returns true. The second and third guard check  and against 255, respectively and return true if they match. The last guard fires as a last resort and returns.

Instance Declarations
In order to declare a type to be an instance of a class, you need to provide an instance declaration for it. Most classes provide what's called a "minimal complete definition." This means the functions which must be implemented for this class in order for its definition to be satisfied. Once you've written these functions for your type, you can declare it an instance of the class.

The <tt>Eq</tt> Class
The  class has two members (i.e., two functions):

(==) :: Eq a => a -> a -> Bool (/=) :: Eq a => a -> a -> Bool The first of these type signatures reads that the function  is a function which takes two  s which are members of   and produces a. The type signature of  (not equal) is identical. A minimal complete definition for the  class requires that either one of these functions be defined (if you define, then  is defined automatically by negating the result of  , and vice versa). These declarations must be provided inside the instance declaration.

This is best demonstrated by example. Suppose we have our color example, repeated here for convenience:

data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black | Custom Int Int Int -- R G B components We can define  to be an instance of   by the following declaration:

instance Eq Color where Red == Red = True Orange == Orange = True Yellow == Yellow = True Green == Green = True Blue == Blue = True Purple == Purple = True White == White = True Black == Black = True (Custom r g b) == (Custom r' g' b') = r == r' && g == g' && b == b'   _ == _ = False The first line here begins with the keyword <tt>instance</tt> telling the compiler that we're making an instance declaration. It then specifies the class,, and the type,   which is going to be an instance of this class. Following that, there's the <tt>where</tt> keyword. Finally there's the method declaration.

The first eight lines of the method declaration are basically identical. The first one, for instance, says that the value of the expression  is equal to. Lines two through eight are identical. The declaration for custom colors is a bit different. We pattern match  on both sides of. On the left hand side, we bind,   and   to the components, respectively. On the right hand side, we bind , and  to the components. We then say that these two custom colors are equal precisely when,   and are all equal. The fallthrough says that any pair we haven't previously declared as equal are unequal.

The <tt>Show</tt> Class
The  class is used to display arbitrary values as strings. This class has three methods:

show :: Show a => a -> String showsPrec :: Show a => Int -> a -> String -> String showList :: Show a => [a] -> String -> String A minimal complete definition is either  or  (we will talk about   later -- it's in there for efficiency reasons). We can define our  datatype to be an instance of   with the following instance declaration:

instance Show Color where show Red = "Red" show Orange = "Orange" show Yellow = "Yellow" show Green = "Green" show Blue = "Blue" show Purple = "Purple" show White = "White" show Black = "Black" show (Custom r g b) = "Custom " ++ show r ++ " " ++ show g ++ " " ++ show b This declaration specifies exactly how to convert values of type to s.  Again, the first eight lines are identical and simply take a  and produce a string. The last line for handling custom colors matches out the RGB components and creates a string by concatenating the result of ing the components individually (with spaces in between and "Custom" at the beginning).

Other Important Classes
There are a few other important classes which I will mention briefly because either they are commonly used or because we will be using them shortly. I won't provide example instance declarations; how you can do this should be clear by now.

The <tt>Ord</tt> Class
The ordering class, the functions are:

compare :: Ord a => a -> a -> Ordering (<=) :: Ord a => a -> a -> Bool (>) :: Ord a => a -> a -> Bool (>=) :: Ord a => a -> a -> Bool (<) :: Ord a => a -> a -> Bool min :: Ord a => a -> a -> a max :: Ord a => a -> a -> a Almost any of the functions alone is a minimal complete definition; it is recommended that you implement  if you implement only one, though. This function returns a value of type which is defined as:

data Ordering = LT | EQ | GT So, for instance, we get:

In order to declare a type to be an instance of  you must already have declared it an instance of  (in other words,  is a subclass of   -- more about this in the section on Classes).

The <tt>Enum</tt> Class
The  class is for enumerated types; that is, for types where each element has a successor and a predecessor. Its methods are:

pred :: Enum a => a -> a succ :: Enum a => a -> a toEnum :: Enum a => Int -> a fromEnum :: Enum a => a -> Int enumFrom :: Enum a => a -> [a] enumFromThen :: Enum a => a -> a -> [a] enumFromTo :: Enum a => a -> a -> [a] enumFromThenTo :: Enum a => a -> a -> a -> [a] The minimal complete definition contains both  and , which converts from and to s.  The and  functions give the predecessor and successor, respectively. The  functions enumerate lists of elements. For instance,  lists all elements after  ; lists all elements starting at  in steps of size. The  functions end the enumeration at the given element.

The <tt>Num</tt> Class
The  class provides the standard arithmetic operations:

(-) :: Num a => a -> a -> a (*) :: Num a => a -> a -> a (+) :: Num a => a -> a -> a negate :: Num a => a -> a signum :: Num a => a -> a abs :: Num a => a -> a fromInteger :: Num a => Integer -> a All of these are obvious except for perhaps  which is the unary minus. That is,  means $$-x$$.

The <tt>Read</tt> Class
The  class is the opposite of the   class. It is a way to take a string and read in from it a value of arbitrary type. The methods for  are:

readsPrec :: Read a => Int -> String -> [(a, String)] readList :: String -> [([a], String)] The minimal complete definition is. The most important function related to this is, which uses   as:

read s = fst (head (readsPrec 0 s)) This will fail if parsing the string fails. You could define a function as:

maybeRead s = case readsPrec 0 s of     [(a,_)] -> Just a      _ -> Nothing How to write and use  directly will be discussed further in the examples.

Class Contexts
Suppose we are defining the  datatype from scratch. The definition would be something like:

data Maybe a = Nothing | Just a Now, when we go to write the instance declarations, for, say, , we need to know that  is an instance of   otherwise we can't write a declaration. We express this as:

instance Eq a => Eq (Maybe a) where Nothing == Nothing = True (Just x) == (Just x') = x == x' This first line can be read "That  is an instance of  implies  that   is an instance of ."

Deriving Classes
Writing obvious,  ,   and   classes like these is tedious and should be automated. Luckily for us, it is. If you write a datatype that's "simple enough" (almost any datatype you'll write unless you start writing fixed point types), the compiler can automatically derive some of the most basic classes. To do this, you simply add a <tt>deriving</tt> clause to after the datatype declaration, as in:

data Color = Red | ...   | Custom Int Int Int  -- R G B components deriving (Eq, Ord, Show, Read) This will automatically create instances of the  datatype of the named classes. Similarly, the declaration:

data Maybe a = Nothing | Just a            deriving (Eq, Ord, Show, Read) derives these classes just when  is appropriate.

All in all, you are allowed to derive instances of,  , ,,   and. There is considerable work in the area of "polytypic programming" or "generic programming" which, among other things, would allow for instance declarations for any class to be derived. This is much beyond the scope of this tutorial; instead, I refer you to the literature.

Datatypes Revisited
I know by this point you're probably terribly tired of hearing about datatypes. They are, however, incredibly important, otherwise I wouldn't devote so much time to them. Datatypes offer a sort of notational convenience if you have, for instance, a datatype that holds many many values. These are called named fields.

Named Fields
Consider a datatype whose purpose is to hold configuration settings. Usually when you extract members from this type, you really only care about one or possibly two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One thing you could do would be to write accessor functions. Consider the following made-up configuration type for a terminal program:

data Configuration = Configuration String         -- user name String         -- local host String         -- remote host Bool           -- is guest? Bool           -- is super user? String         -- current directory String         -- home directory Integer        -- time connected deriving (Eq, Show) You could then write accessor functions, like (I've only listed a few):

getUserName (Configuration un _ _ _ _ _ _ _) = un getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig ... You could also write update functions to update a single element. Of course, now if you add an element to the configuration, or remove one, all of these functions now have to take a different number of arguments. This is highly annoying and is an easy place for bugs to slip in. However, there's a solution. We simply give names to the fields in the datatype declaration, as follows:

data Configuration = Configuration { username     :: String, localhost    :: String, remotehost   :: String, isguest      :: Bool, issuperuser  :: Bool, currentdir   :: String, homedir      :: String, timeconnected :: Integer } This will automatically generate the following accessor functions for us:

username :: Configuration -> String localhost :: Configuration -> String ... Moreover, it gives us very convenient update methods. Here is a short example for a "post working directory" and "change directory" like functions that work on s:

changeDir :: Configuration -> String -> Configuration changeDir cfg newDir = -- make sure the directory exists if directoryExists newDir then -- change our current directory cfg{currentdir = newDir} else error "directory does not exist"

postWorkingDir :: Configuration -> String -- retrieve our current directory postWorkingDir cfg = currentdir cfg So, in general, to update the field  in a datatype   to, you write. You can change more than one; each should be separated by commas, for instance,.

You can of course continue to pattern match against s as you did before. The named fields are simply syntactic sugar; you can still write something like:

getUserName (Configuration un _ _ _ _ _ _ _) = un But there is little reason to. Finally, you can pattern match against named fields as in:

getHostData (Configuration {localhost=lh,remotehost=rh}) = (lh,rh) This matches the variable  against the   field on the   and the variable   against the field on the. These matches of course succeed. You could also constrain the matches by putting values instead of variable names in these positions, as you would for standard datatypes.

You can create values of  in the old way as shown in the first definition below, or in the named-field's type, as shown in the second definition below:

initCFG = Configuration "nobody" "nowhere" "nowhere" False False "/" "/" 0 initCFG' = Configuration { username="nobody", localhost="nowhere", remotehost="nowhere", isguest=False, issuperuser=False, currentdir="/", homedir="/", timeconnected=0 } Though the second is probably much more understandable unless you litter your code with comments.

More Lists
to do: put something here

Standard List Functions
Recall that the definition of the built-in Haskell list datatype is equivalent to:

data List a = Nil | Cons a (List a)

With the exception that  is called   and is called. This is simply to make pattern matching easier and code smaller. Let's investigate how some of the standard list functions may be written. Consider. A definition is given below:

map _ [] = [] map f (x:xs) = f x : map f xs Here, the first line says that when you  across an empty list, no matter what the function is, you get an empty list back. The second line says that when you  across a list with   as the head and   as the tail, the result is   applied to  consed onto the result of mapping   on.

The  can be defined similarly:

filter _ [] = [] filter p (x:xs) | p x = x : filter p xs               | otherwise = filter p xs How this works should be clear. For an empty list, we return an empty list. For a non empty list, we return the filter of the tail, perhaps with the head on the front, depending on whether it satisfies the predicate  or not.

We can define  as:

foldr _ z [] = z foldr f z (x:xs) = f x (foldr f z xs) Here, the best interpretation is that we are replacing the empty list with a particular value and the list constructor with some function. On the first line, we can see the replacement of for. Using backquotes to make  infix, we can write the second line as:

foldr f z (x:xs) = x `f` (foldr f z xs) From this, we can directly see how  is being replaced by.

Finally, :

foldl _ z [] = z foldl f z (x:xs) = foldl f (f z x) xs This is slightly more complicated. Remember,  can be thought of as the current state. So if we're folding across a list which is empty, we simply return the current state. On the other hand, if the list is not empty, it's of the form. In this case, we get a new state by applying  to the current state   and the current list element  and then recursively call   on  with this new state.

There is another class of functions: the  and functions, which respectively take multiple lists and make one or take one lists and split them apart. For instance,  does the following:

Basically, it pairs the first elements of both lists and makes that the first element of the new list. It then pairs the second elements of both lists and makes that the second element, etc. What if the lists have unequal length? It simply stops when the shorter one stops. A reasonable definition for  is:

zip [] _ = [] zip _ [] = [] zip (x:xs) (y:ys) = (x,y) : zip xs ys The  function does the opposite. It takes a zipped list and returns the two "original" lists:

There are a whole slew of  and   functions, named ,,  ,   and so on; the functions use triples instead of pairs; the functions use 4-tuples, etc.

Finally, the function  takes an integer $$n$$ and a list and returns the first $$n$$ elements off the list. Correspondingly, takes an integer $$n$$ and a list and returns the result of throwing away the first $$n$$ elements off the list. Neither of these functions produces an error; if $$n$$ is too large, they both will just return shorter lists.

List Comprehensions
There is some syntactic sugar for dealing with lists whose elements are members of the  class (see the section on Instances), such as   or .  If we want to create a list of all the elements from $$1$$ to $$10$$, we can simply write:

We can also introduce an amount to step by:

These expressions are short hand for  and , respectively. Of course, you don't need to specify an upper bound. Try the following (but be ready to hit Control+C to stop the computation!):

Probably yours printed a few thousand more elements than this. As we said before, Haskell is lazy. That means that a list of all numbers from 1 on is perfectly well formed and that's exactly what this list is. Of course, if you attempt to print the list (which we're implicitly doing by typing it in the interpreter), it won't halt. But if we only evaluate an initial segment of this list, we're fine:

This comes in useful if, say, we want to assign an ID to each element in a list. Without laziness we'd have to write something like this:

assignID :: [a] -> [(a,Int)] assignID l = zip l [1..length l] Which means that the list will be traversed twice. However, because of laziness, we can simply write:

assignID l = zip l [1..] And we'll get exactly what we want. We can see that this works:

Finally, there is some useful syntactic sugar for  and , based on standard set-notation in mathematics. In math, we would write something like $$\{ f(x) | x \in s \land p(x) \}$$ to mean the set of all values of $$f$$ when applied to elements of $$s$$ which satisfy $$p$$. This is equivalent to the Haskell statement . However, we can also use more math-like notation and write. While in math the ordering of the statements on the side after the pipe is free, it is not so in Haskell. We could not have put  before otherwise the compiler wouldn't know yet what  was. We can use this to do simple string processing. Suppose we want to take a string, keep only the uppercase letters and convert those to lowercase. We could do this in either of the following two equivalent ways:

These two are equivalent, and, depending on the exact functions you're using, one might be more readable than the other. There's more you can do here, though. Suppose you want to create a list of pairs, one for each point between (0,0) and (5,7) below the diagonal. Doing this manually with lists and maps would be cumbersome and possibly difficult to read. It couldn't be easier than with list comprehensions:

If you reverse the order of the  and   clauses, the order in which the space is traversed will be reversed (of course, in that case,  could no longer depend on   and you would need to make   depend on   but this is trivial).

Arrays
Lists are nice for many things. It is easy to add elements to the beginning of them and to manipulate them in various ways that change the length of the list. However, they are bad for random access, having average complexity $$\mathcal{O}(n)$$ to access an arbitrary element (if you don't know what $$\mathcal{O}(...)$$ means, you can either ignore it or take a quick detour and read the appendix chapter Complexity, a two-page introduction to complexity theory). So, if you're willing to give up fast insertion and deletion because you need random access, you should use arrays instead of lists.

In order to use arrays you must import the  module. There are a few methods for creating arrays, the  function, the function, and the  function. The function takes a pair which is the bounds of the array, and an association list which specifies the initial values of the array. The  function takes bounds and then simply a list of values. Finally, the  function takes an accumulation function, an initial value and an association list and accumulates pairs from the list into the array. Here are some examples of arrays being created:

When arrays are printed out (via the show function), they are printed with an association list. For instance, in the first example, the association list says that the value of the array at $$1$$ is $$2$$, the value of the array at $$2$$ is $$4$$, and so on.

You can extract an element of an array using the  function, which takes an array and an index, as in:

Moreover, you can update elements in the array using the function. This takes an array and an association list and updates the positions specified in the list:

There are a few other functions which are of interest:

If we define  to be , the result of these functions applied to  are:

Note that while arrays are $$\mathcal{O}(1)$$ access, they are not $$\mathcal{O}(1)$$ update. They are in fact $$\mathcal{O}(n)$$ update, since in order to maintain purity, the array must be copied in order to make an update. Thus, functional arrays are pretty much only useful when you're filling them up once and then only reading. If you need fast access and update, you should probably use s, which are discussed in the section on Finitemaps and have $$\mathcal{O}(\log n)$$ access and update.

Maps
The  datatype from the   module is a purely functional implementation of balanced trees. Maps can be compared to lists and arrays in terms of the time it takes to perform various operations on those datatypes of a fixed size, $$n$$. A brief comparison is:

As we can see, lists provide fast insertion (but slow everything else), arrays provide fast lookup (but slow everything else) and maps provide moderately fast everything.

The type of a map is for the form  where is the type of the keys and  is the type of the elements. That is, maps are lookup tables from type to type.

The basic map functions are:

empty :: Map k a insert :: k -> a -> Map k a -> Map k a delete :: k -> Map k a -> Map k a member :: k -> Map k a -> Bool lookup :: k -> Map k a -> a In all these cases, the type  must be an instance of  (and hence also an instance of  ).

There are also function  and   to convert lists to and from maps. Try the following:

The Final Word on Lists
You are likely tired of hearing about lists at this point, but they are so fundamental to Haskell (and really all of functional programming) that it would be terrible not to talk about them some more.

It turns out that  is actually quite a powerful function: it can compute a primitive recursive function. A primitive recursive function is essentially one which can be calculated using only "for" loops, but not "while" loops.

In fact, we can fairly easily define  in terms of  :

map2 f = foldr (\a b -> f a : b) [] Here,  is the accumulator (i.e., the result list) and   is the element being currently considered. In fact, we can simplify this definition through a sequence of steps:

foldr (\a b -> f a : b) [] ==> foldr (\a b ->  (f a) b) [] ==> foldr (\a ->  (f a)) [] ==> foldr (\a -> (. f) a) [] ==> foldr ( . f) [] This is directly related to the fact that  is the identity function on lists. This is because, as mentioned before, can be thought of as replacing the  in lists by  and the   by. In this case, we're keeping both the same, so it is the identity function.

In fact, you can convert any function of the following style into a :

myfunc [] = z myfunc (x:xs) = f x (myfunc xs) By writing the last line with  in infix form, this should be obvious:

myfunc [] = z myfunc (x:xs) = x `f` (myfunc xs) Clearly, we are just replacing  with   and   with . Consider the   function:

filter p [] = [] filter p (x:xs) = if p x   then x : filter p xs    else filter p xs This function also follows the form above. Based on the first line, we can figure out that  is supposed to be , just like in the   case. Now, suppose that we call the result of calling simply, then we can rewrite this as:

filter p [] = [] filter p (x:xs) = if p x then x : b else b Given this, we can transform  into a fold:

filter p = foldr (\a b -> if p a then a:b else b) [] Let's consider a slightly more complicated function:. The definition for  is:

(++) []    ys = ys (++) (x:xs) ys = x : (xs ++ ys) Now, the question is whether we can write this in fold notation. First, we can apply eta reduction to the first line to give:

(++) [] = id Through a sequence of steps, we can also eta-reduce the second line:

(++) (x:xs) ys = x : ((++) xs ys) ==> (++) (x:xs) ys = (x:) ((++) xs ys) ==> (++) (x:xs) ys = ((x:) . (++) xs) ys ==>  (++) (x:xs) = (x:). (++) xs Thus, we get that an eta-reduced definition of  is:

(++) []    = id (++) (x:xs) = (x:). (++) xs Now, we can try to put this into fold notation. First, we notice that the base case converts  into. Now, if we assume is called  and   is called , we can get the following definition in terms of :

(++) = foldr (\a b -> (a:) . b) id This actually makes sense intuitively. If we only think about applying  to one argument, we can think of it as a function which takes a list and creates a function which, when applied, will prepend this list to another list. In the lambda function, we assume we have a function  which will do this for the rest of the list and we need to create a function which will do this for  as well as the single element. In order to do this, we first apply and then further add  to the front.

We can further reduce this expression to a point-free style through the following sequence:

==> (++) = foldr (\a b -> (a:) . b) id ==>  (++) = foldr (\a b -> (.) (a:) b) id ==>  (++) = foldr (\a -> (.) (a:)) id ==>  (++) = foldr (\a -> (.) ( a)) id ==>  (++) = foldr (\a -> ((.). ) a) id ==> (++) = foldr ((.) . ) id This final version is point free, though not necessarily understandable. Presumbably the original version is clearer.

As a final example, consider. We can write this as:

concat []    = [] concat (x:xs) = x ++ concat xs It should be immediately clear that the  element for the fold is  and that the recursive function is , yielding:

concat = foldr (++) []