Write Yourself a Scheme in 48 Hours/Towards a Standard Library

Our Scheme is almost complete now, but it's still rather hard to use. At the very least, we'd like a library of standard list manipulation functions that we can use to perform some common computations.

Rather than using a typical Scheme implementation, defining each list function in terms of a recursion on lists, we'll implement two primitive recursion operators ( and  ) and then define our whole library based on those. This style is used by the Haskell Prelude: it gives you more concise definitions, less room for error, and good practice using fold to capture iteration.

We'll start by defining a few obvious helper functions. and  are defined exactly as you'd expect it, using if statements:

We can use the varargs feature to define list, which just returns a list of its arguments:

We also want an  function, which just returns its argument unchanged. This may seem completely useless – if you already have a value, why do you need a function to return it? However, several of our algorithms expect a function that tells us what to do with a given value. By defining, we let those higher-order functions work even if we don't want to do anything with the value.

Similarly, it'd be nice to have a  function, in case we want to pass in a function that takes its arguments in the wrong order:

Finally, we add  and , which work like their Haskell equivalents (partial application and the dot operator, respectively).

We might as well define some simple library functions that appear in the Scheme standard:

These are basically done just as you'd expect them. Note the usage of curry to define,   and. We bind the variable  to the function returned by curry, giving us a unary function that returns true if its argument is equal to zero.

Next, we want to define a fold function that captures the basic pattern of recursion over a list. The best way to think about fold is to picture a list in terms of its infix constructors:  in Haskell or   in Scheme. A fold function replaces every constructor with a binary operation, and replaces  with the accumulator. So, for example,.

With that definition, we can write our fold function. Start with a right-associative version to mimic the above examples:

The structure of this function mimics our definition almost exactly. If the list is null, replace it with the end value. If not, apply the function to the car of the list and to the result of folding this function and end value down the rest of the list. Since the right-hand operand is folded up first, you end up with a right-associative fold.

We also want a left-associative version. For most associative operations like  and   the two of them are completely equivalent. However, there is at least one important binary operation that is not associative: cons. For all our list manipulation functions, then, we'll need to deliberately choose between left- and right-associative folds.

This begins the same way as the right-associative version, with the test for null that returns the accumulator. This time, however, we apply the function to the accumulator and first element of the list, instead of applying it to the first element and the result of folding the list. This means that we process the beginning first, giving us left-associativity. Once we reach the end of the list,, we then return the result that we've been progressively building up.

Note that func takes its arguments in the opposite order from. In, the accumulator represents the rightmost value to tack onto the end of the list, after you've finished recursing down it. In, it represents the completed calculation for the leftmost part of the list. In order to preserve our intuitions about commutativity of operators, it should therefore be the left argument of our operation in, but the right argument in.

Once we've got our basic folds, we can define a couple of convenience names to match typical Scheme usage:

These are just new variables bound to the existing functions: they don't define new functions. Most Schemes call  " " or plain old " ", and don't make the distinction between   and. We define it to be, which happens to be tail-recursive and hence runs more efficiently than   (it doesn't have to recurse all the way down to the end of the list before it starts building up the computation). Not all operations are associative, however; we'll see some cases later where we have to use  to get the right result.

Next, we want to define a function that is the opposite of. Given a unary function, an initial value, and a unary predicate, it continues applying the function to the last value until the predicate is true, building up a list as it goes along.

As usual, our function structure basically matches the definition. If the predicate is true, then we cons a  onto the last value, terminating the list. Otherwise, cons the result of unfolding the next value  onto the current value.

In academic functional programming literature, folds are often called catamorphisms, unfolds are often called anamorphisms, and the combinations of the two are often called hylomorphisms. They're interesting because any for-each loop can be represented as a catamorphism. To convert from a loop to a foldl, package up all mutable variables in the loop into a data structure (records work well for this, but you can also use an algebraic data type or a list). The initial state becomes the accumulator; the loop body becomes a function with the loop variables as its first argument and the iteration variable as its second; and the list becomes, well, the list. The result of the fold function is the new state of all the mutable variables.

Similarly, every for-loop (without early exits) can be represented as a hylomorphism. The initialization, termination, and step conditions of a for-loop define an anamorphism that builds up a list of values for the iteration variable to take. Then, you can treat that as a for-each loop and use a catamorphism to break it down into whatever state you wish to modify.

Let's go through a couple examples. We'll start with typical,  ,   &amp;   functions:

These all follow from the definitions:



Since all of these operators are associative, it doesn't matter whether we use  or. We replace the cons constructor with the operator, and the nil constructor with the identity element for that operator.

Next, let's try some more complicated operators. and  find the maximum and minimum of their arguments, respectively:

It's not immediately obvious what operation to fold over the list, because none of the built-ins quite qualify. Instead, think back to fold as a representation of a foreach loop. The accumulator represents any state we've maintained over previous iterations of the loop, so we'll want it to be the maximum value we've found so far. That gives us our initialization value: we want to start off with the leftmost variable in the list (since we're doing a ). Now recall that the result of the operation becomes the new accumulator at each step, and we've got our function. If the previous value is greater, keep it. If the new value is greater, or they're equal, return the new value. Reverse the operation for.

How about ? We know that we can find the length of a list by counting down it, but how do we translate that into a fold?

Again, think in terms of its definition as a loop. The accumulator starts off at 0 and gets incremented by 1 with each iteration. That gives us both our initialization value, 0, and our function,. Another way to look at this is "The length of a list is 1 + the length of the sublist to its left".

Let's try something a bit trickier:.

The function here is fairly obvious: if you want to reverse two  cells, you can just   so it takes its arguments in the opposite order. However, there's a bit of subtlety at work. Ordinary lists are right associative:. If you want to reverse this, you need your fold to be left associative:. Try it with a  instead of a foldl and see what you get.

There's a whole family of  and   functions, all of which can be represented with folds. The particular lambda expression is fairly complicated though, so let's factor it out:

The helper function is parameterized by the predicate to use and the operation to apply to the result if found. Its accumulator represents the first value found so far: it starts out with  and takes on the first value that satisfies its predicate. We avoid finding subsequent values by testing for a non- value and returning the existing accumulator if it's already set. We also provide an operation that will be applied to the next value each time the predicate tests: this lets us customize  to check the value itself (for  ) or only the key of the value (for  ).

The rest of the functions are just various combinations of,  ,   and  ,  , folded over the list with an initial value of.

Next, let's define the functions  and. applies a function to every element of a list, returning a new list with the transformed values:

Remember that 's function takes its arguments in the opposite order as , with the current value on the left. Map's lambda applies the function to the current value, then conses it with the rest of the mapped list, represented by the right-hand argument. It's essentially replacing every infix cons constructor with one that conses, but also applies the function to its left-side argument.

keeps only the elements of a list that satisfy a predicate, dropping all others:

This works by testing the current value against the predicate. If it's true, replacing cons with, i.e., don't do anything. If it's false, drop the  and just return the rest of the list. This eliminates all the elements that don't satisfy the predicate, consing up a new list that includes only the ones that do.

We can use the standard library by starting up our Lisp interpreter and typing :

$ ./lisp Lisp&gt;&gt;&gt; (load "stdlib.scm") (lambda ("pred". lst) ...) Lisp&gt;&gt;&gt; (map (curry + 2) '(1 2 3 4)) (3 4 5 6) Lisp&gt;&gt;&gt; (filter even? '(1 2 3 4)) (2 4) Lisp&gt;&gt;&gt; quit

There are many other useful functions that could go into the standard library, including,  , append, and various string-manipulation functions. Try implementing them as folds. Remember, the key to successful fold programming is thinking only in terms of what happens on each iteration. Fold captures the pattern of recursion down a list, and recursive problems are best solved by working one step at a time.