Write Yourself a Scheme in 48 Hours/Adding Variables and Assignment

Finally, we get to the good stuff: variables. A variable lets us save the result of an expression and refer to it later. In Scheme, a variable can also be reset to new values, so that its value changes as the program executes. This presents a complication for Haskell, because the execution model is built upon functions that return values, but never change them.

Nevertheless, there are several ways to simulate state in Haskell, all involving monads. The simplest is probably the state monad, which lets you hide arbitrary state within the monad and pass it around behind the scenes. You specify the state type as a parameter to the monad (e.g. if a function returns an integer but modifies a list of string pairs, it would have type ), and access it via the get and put functions, usually within a do-block. You'd specify the initial state via the, which returns a pair containing the return value and the final state.

Unfortunately, the state monad doesn't work well for us, because the type of data we need to store is fairly complex. For a simple top-level environment, we could get away with, storing mappings from variable names to values. However, when we start dealing with function calls, these mappings become a stack of nested environments, arbitrarily deep. And when we add closures, environments might get saved in an arbitrary Function value, and passed around throughout the program. In fact, they might be saved in a variable and passed out of the  monad entirely, something we're not allowed to do.

Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables. There are two flavors of state threads: the monad creates a stateful computation that can be executed as a unit, without the state escaping to the rest of the program. The module lets you use stateful variables within the IO monad. Since our state has to be interleaved with IO anyway (it persists between lines in the REPL, and we will eventually have IO functions within the language itself), we'll be using s.

We can start out by importing and defining a type for our environments:

This declares an  as an   holding a list that maps  s to mutable  s. We need  s for both the list itself and for individual values because there are two ways that the program can mutate the environment. It might use  to change the value of an individual variable, a change visible to any function that shares that environment (Scheme allows nested scopes, so a variable in an outer scope is visible to all inner scopes). Or it might use  to add a new variable, which should be visible on all subsequent statements.

Since s can only be used within the IO monad, we'll want a helper action to create an empty environment. We can't just use the empty list  because all accesses to  s must be sequenced, and so the type of our null environment is   instead of just plain  :

From here, things get a bit more complicated, because we'll be simultaneously dealing with two monads. Remember, we also need an  monad to handle things like unbound variables. The parts that need IO functionality and the parts that may throw exceptions are interleaved, so we can't just catch all the exceptions and return only normal values to the IO monad.

Haskell provides a mechanism known as monad transformers that lets you combine the functionality of multiple monads. We'll be using one of these,, which lets us layer error-handling functionality on top of the IO monad. Our first step is to create a type synonym for our combined monad:

Like,   is really a type constructor: we've left off the last argument, the return type of the function. However,  takes one more argument than plain old  : we have to specify the type of monad that we're layering our error-handling functionality over. We've created a monad that may contain IO actions that throw a.

We have a mix of  and   functions, but actions of different types cannot be contained within the same do-block, even if they provide essentially the same functionality. Haskell already provides a mechanism, lifting, to bring values of the lower type into the combined monad. Unfortunately, there's no similar support to bring a value of the untransformed upper type into the combined monad, so we need to write it ourselves:

This destructures the  type and either re-throws the error type or returns the ordinary value. Methods in typeclasses resolve based on the type of the expression, so and  (members of  and, respectively) take on their   definitions. Incidentally, the type signature provided here is not fully general: if we'd left it off, the compiler would have inferred.

We'll also want a helper function to run the whole top-level  action, returning an IO action. We can't escape from the IO monad, because a function that performs IO has an effect on the outside world, and you don't want that in a lazily-evaluated pure function. But you can run the error computation and catch the errors.

This uses our previously-defined  function to take any error values and convert them to their string representations, then runs the whole computation via. The result is passed into  and returned as a value in the IO monad.

Now we're ready to return to environment handling. We'll start with a function to determine if a given variable is already bound in the environment, necessary for proper handling of :

This first extracts the actual environment value from its  via. Then we pass it to lookup to search for the particular variable we're interested in. lookup returns a  value, so we return   if that value was   and   otherwise (we need to use the  function because  expects a function to perform on the result and not just a value). Finally, we use return to lift that value into the IO monad. Since we're just interested in a true/false value, we don't need to deal with the actual  that lookup returns.

Next, we'll want to define a function to retrieve the current value of a variable:

Like the previous function, this begins by retrieving the actual environment from the. However,  uses the   monad, because it also needs to do some error handling. As a result, we need to use the function to lift the   action into the combined monad. Similarly, when we return the value, we use  to generate an   action that reads the returned. We don't need to use  to throw an error, however, because   is a defined for the typeclass, of which   is an instance.

Now we create a function to set values:

Again, we first read the environment out of its  and run a lookup on it. This time, however, we want to change the variable instead of just reading it. The action provides a means for this, but takes its arguments in the wrong order (ref -&gt; value instead of value -&gt; ref). So we use the built-in function to switch the arguments of   around, and then pass it the value. Finally, we return the value we just set, for convenience.

We'll want a function to handle the special behavior of, which sets a variable if already bound or creates a new one if not. Since we've already defined a function to set values, we can use it in the former case:

It's the latter case that's interesting, where the variable is unbound. We create an IO action (via do-notation) that creates a new  to hold the new variable, reads the current value of the environment, then writes a new list back to that variable consisting of the new (key, variable) pair added to the front of the list. Then we lift that whole do-block into the  monad with.

There's one more useful environment function: being able to bind a whole bunch of variables at once, as happens when a function is invoked. We might as well build that functionality now, though we won't be using it until the next section:

This is perhaps more complicated than the other functions, since it uses a monadic pipeline (rather than do-notation) and a pair of helper functions to do the work. It's best to start with the helper functions. addBinding takes a variable name and value, creates an  to hold the new variable, and then returns the name–value pair. calls  on each member of bindings  to create a list of   pairs, and then appends the current environment to the end of that. Finally, the whole function wires these functions up in a pipeline, starting by reading the existing environment out of its, then passing the result to  , then returning a new   with the extended environment.

Now that we have all our environment functions, we need to start using them in the evaluator. Since Haskell has no global variables, we'll have to thread the environment through the evaluator as a parameter. While we're at it, we might as well add the and  special forms.

Since a single environment gets threaded through a whole interactive session, we need to change a few of our IO functions to take an environment.

We need the  in evalString because the type of the monad has changed from   to. Similarly, we need a  to bring   into the   monad.

Next, we initialize the environment with a null variable before starting the program:

We've created an additional helper function runOne to handle the single-expression case, since it's now somewhat more involved than just running. The changes to  are a bit more subtle: notice how we added a function composition operator before evalAndPrint. That's because  now takes an additional   parameter, fed from. The function composition tells  that instead of taking plain old   as an action, it ought to apply it first to whatever's coming down the monadic pipeline, in this case the result of. Thus, the actual function that gets applied to each line of input is, just as we want it.

Finally, we need to change our main function to call  instead of evaluating   directly:

And we can compile and test our program:

$ ghc -package parsec -o lisp [../code/listing8.hs listing8.hs] $ ./lisp Lisp&gt;&gt;&gt; (define x 3) 3 Lisp&gt;&gt;&gt; (+ x 2) 5 Lisp&gt;&gt;&gt; (+ y 2) Getting an unbound variable: y Lisp&gt;&gt;&gt; (define y 5) 5 Lisp&gt;&gt;&gt; (+ x (- y 2)) 6 Lisp&gt;&gt;&gt; (define str "A string") "A string" Lisp&gt;&gt;&gt; (&lt; str "The string") Invalid type: expected number, found "A string" Lisp&gt;&gt;&gt; (string&lt;? str "The string") #t