Write Yourself a Scheme in 48 Hours/Evaluation, Part 1

Beginning the Evaluator
Currently, we've just been printing out whether or not we recognize the given program fragment. We're about to take the first steps towards a working Scheme interpreter: assigning values to program fragments. We'll be starting with baby steps, but fairly soon you'll be progressing to working with computations.

Let's start by telling Haskell how to print out a string representation of the various possible s:

This is our first real introduction to pattern matching. Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables. Any constructor can appear in a pattern; that pattern matches a value if the tag is the same as the value's tag and all subpatterns match their corresponding components. Patterns can be nested arbitrarily deep, with matching proceeding in an inside → outside, left → right order. The clauses of a function definition are tried in textual order, until one of the patterns matches. If this is confusing, you'll see some examples of deeply-nested patterns when we get further into the evaluator.

For now, you only need to know that each clause of the above definition matches one of the constructors of, and the right-hand side tells what to do for a value of that constructor.

The  and   clauses work similarly, but we need to define a helper function   to convert the contained list into a string:

The  function works like the Haskell Prelude's  function, which glues together a list of words with spaces. Since we're dealing with a list of s instead of words, we define a function that first converts the  s into their string representations and then applies   to it:

Our definition of unwordsList doesn't include any arguments. This is an example of point-free style: writing definitions purely in terms of function composition and partial application, without regard to individual values or "points". Instead, we define it as the composition of a couple of built-in functions. First, we partially-apply to , which creates a function that takes a list of  s and returns a list of their string representations. Haskell functions are curried: this means that a function of two arguments, like, is really a function that returns a function of one argument. As a result, if you supply only a single argument, you get back a function of one argument that you can pass around, compose, and apply later. In this case, we compose it with unwords:  converts a list of  s to a list of their string representations, and then   joins the result together with spaces.

We used the function above. This standard Haskell function lets you convert any type that's an instance of the class  into a string. We'd like to be able to do the same with, so we make it into a member of the class  , defining its   method as  :

A full treatment of type classes is beyond the scope of this tutorial; you can find more information in other tutorials and the Haskell 2010 report.

Let's try things out by changing our readExpr function so it returns the string representation of the value actually parsed, instead of just "Found value":

And compile and run…

$ ghc -package parsec -o parser listing4.1.hs $ ./parser "(1 2 2)" Found (1 2 2) $ ./parser "'(1 3 (\"this\" \"one\"))" Found (quote (1 3 ("this" "one")))

Beginnings of an evaluator: Primitives
Now, we start with the beginnings of an evaluator. The purpose of an evaluator is to map some "code" data type into some "data" data type, the result of the evaluation. In Lisp, the data types for both code and data are the same, so our evaluator will return a. Other languages often have more complicated code structures, with a variety of syntactic forms.

Evaluating numbers, strings, booleans, and quoted lists is fairly simple: return the datum itself.

This introduces a new type of pattern. The notation  matches against any   that's a string and then binds   to the whole , and not just the contents of the   constructor. The result has type  instead of type. The underbar is the "don't care" variable, matching any value yet not binding it to a variable. It can be used in any pattern, but is especially useful with @-patterns (where you bind the variable to the whole pattern) and with simple constructor-tests where you're just interested in the tag of the constructor.

The last clause is our first introduction to nested patterns. The type of data contained by  is , a list of  s. We match that against the specific two-element list  , a list where the first element is the symbol "quote" and the second element can be anything. Then we return that second element.

Let's integrate  into our existing code. Start by changing  back so it returns the expression instead of a string representation of the expression:

And then change our main function to read an expression, evaluate it, convert it to a string, and print it out. Now that we know about the  monad sequencing operator and the function composition operator, let's use them to make this a bit more concise:

Here, we take the result of the  action (a list of strings) and pass it into the composition of:


 * 1) take the first value ;
 * 2) parse it ;
 * 3) evaluate it ;
 * 4) convert it to a string and print it.

Compile and run the code the normal way:

$ ghc -package parsec -o eval listing4.2.hs $ ./eval "'atom" atom $ ./eval 2 2 $ ./eval "\"a string\"" "a string" $ ./eval "(+ 2 2)" Fail: listing6.hs:83: Non-exhaustive patterns in function eval

We still can't do all that much useful with the program (witness the failed  call), but the basic skeleton is in place. Soon, we'll be extending it with some functions to make it useful.

Adding basic primitives
Next, we'll improve our Scheme so we can use it as a simple calculator. It's still not yet a "programming language", but it's getting close.

Begin by adding a clause to eval to handle function application. Remember that all clauses of a function definition must be placed together and are evaluated in textual order, so this should go after the other eval clauses:

This is another nested pattern, but this time we match against the cons operator ":" instead of a literal list. Lists in Haskell are really syntactic sugar for a chain of cons applications and the empty list:. By pattern-matching against cons itself instead of a literal list, we're saying "give me the rest of the list" instead of "give me the second element of the list". For example, if we passed  to eval,   would be bound to "+" and   would be bound to.

The rest of the clause consists of a couple of functions we've seen before and one we haven't defined yet. We have to recursively evaluate each argument, so we map  over the args. This is what lets us write compound expressions like. Then we take the resulting list of evaluated arguments, and pass it and the original function to apply:

The built-in function looks up a key (its first argument) in a list of pairs. However, lookup will fail if no pair in the list contains the matching key. To express this, it returns an instance of the built-in type. We use the function to specify what to do in case of either success or failure. If the function isn't found, we return a  value, equivalent to   (we'll add more robust error-checking later). If it is found, we apply it to the arguments using, an operator section of the function application operator.

Next, we define the list of primitives that we support:

Look at the type of. It is a list of pairs, just like  expects, but the values of the pairs are functions from   to  . In Haskell, you can easily store functions in other data structures, though the functions must all have the same type.

Also, the functions that we store are themselves the result of a function,, which we haven't defined yet. This takes a primitive Haskell function (often an operator section) and wraps it with code to unpack an argument list, apply the function to it, and wrap the result up in our  constructor.

As with R5RS Scheme, we don't limit ourselves to only two arguments. Our numeric operations can work on a list of any length, so, and. We use the built-in function to do this. It essentially changes every cons operator in the list to the binary function we supply,.

Unlike R5RS Scheme, we're implementing a form of weak typing. That means that if a value can be interpreted as a number (like the string "2"), we'll use it as one, even if it's tagged as a string. We do this by adding a couple extra clauses to. If we're unpacking a string, attempt to parse it with Haskell's built-in function, which returns a list of pairs of (parsed value, remaining string).

For lists, we pattern-match against the one-element list and try to unpack that. Anything else falls through to the next case.

If we can't parse the number, for any reason, we'll return 0 for now. We'll fix this shortly so that it signals an error.

Compile and run this the normal way. Note how we get nested expressions "for free" because we call eval on each of the arguments of a function:

$ ghc -package parsec -o eval listing7.hs $ ./eval "(+ 2 2)" 4 $ ./eval "(+ 2 (-4 1))" 2 $ ./eval "(+ 2 (- 4 1))" 5 $ ./eval "(- (+ 4 6 3) 3 5 2)" 3