Haskell/Mutable objects

Functional purity is a defining characteristic of Haskell, one which leads to many of its strengths. As such, language and ecosystem encourage eschewing mutable state altogether. Thanks to tools such as the  monad, which allows us to keep track of state in a convenient and functionally pure way, and efficient immutable data structures like the ones provided by the   and   packages, Haskell programmers can get by perfectly fine with complete immutability in the vast majority of situations. However, under select circumstances using mutable state is just the most sensible option. One might, for instance, be interested in:


 * From Haskell code, using a library written in another language which assumes mutable state everywhere. This situation often arises with event-callback GUI toolkits.


 * Using Haskell to implement a language that provides imperative-style mutable variables.


 * Implementing algorithms that inherently require destructive updates to variables.


 * Dealing with volumes of bulk data massive enough to justify squeezing every drop of computational power available to make the problem at hand feasible.

Any general-purpose programming language worth its salt should be able to deal with such tasks. With Haskell, it is no different: there are not only ways to create mutable objects, but also to keep mutability under control, existing peacefully in a setting where immutability is the default.

s
Let's begin with the simplest of those use cases above. A common way of structuring code for user interfaces is through the event-and-callback model. The event might be a button click or a key press, while the callback is just a piece of code meant to be called in response to the event. The client code (that is, your code, if you are using such a library) should set up the wiring that connects interface elements, events involving them, and the corresponding callbacks. A hypothetical function to arrange a callback might have the following type:

The  argument is the callback, while the result of   is an   action which sets up the wiring. Running  would lead to "Hello" being printed on the console following every click on.

Both  − with pervasive   and lacking useful return values − and our exposition above have a marked imperative feel. That's because our hypothetical GUI library was written using a more imperative style in a wholly different language. Some good soul has written a facade so that we can use it from Haskell, but the facade is a very thin one, and so the style of the original library leaks into our code.

Using  to perform   actions such as printing to the console or showing dialog boxes is easy enough. However, what if we want to add 1 to a counter every time a button is clicked? The type of  doesn't reveal any way to pass information to the callback, nor to get information back from it (the return types are  ). doesn't help: even if there was a way to pass an initial state to the callback, run a  computation within it, what would we do with the results? We would need to pass the resulting state of the counter to the callback on the next time the button is clicked, and we would have no idea when that would happen, nor a place to keep the value in the meantime.

The obvious solution to this issue in many languages would be creating a mutable variable outside of the callback and then giving the callback a reference to it, so that its code can change the value of the variable at will. We need not worry, though, as Haskell allows us to do exactly that. In fact, there are several types of mutable variables available, the simplest of which is the. s are very simple; they are just boxes containing mutable values. We can create one as follows:

takes a value and gives back, as the result of an  action, an   initialised to that value. We can then use  to retrieve the value in it...

... and  and   to change it:

An  would be enough for implementing the counter, given that it would persist between button clicks. The code might look like this:

Note there is no point in using s indiscriminately, without a good reason for it. Beyond the more fundamental concerns with mutable state, it just would not be very convenient to do so with all those explicit read/write/modify calls, not to mention the need to introduce  in extra places to handle the   (in our hypothetical example that wouldn't be much of an issue, as the GUI code would have to live in   anyway, and we presumably would keep it apart from the pure functions forming the core of our program, as good Haskell practice dictates). Still, s are there for when you can't avoid them.

The pitfalls of concurrency
There is another very important use case for mutable variables that we didn't mention in the introduction: concurrency, that is, circumstances when simultaneous computations are being executed by the computer. Concurrency scenarios range from the simple (a progress bar displaying the status of a background task) to the extremely complex (server-side software handling thousands of requests at once). Given that in principle nothing guarantees that simultaneous computations will run in step with each other, any communication between them calls for mutable variables. That, however, introduces a complication: the issues with understandability and predictability of code using mutable state become much more serious in the presence of independent computations with unpredictable timings. For instance, computation A might need the result of computation B, but it might ask for that result earlier than predicted and thus acquire a bogus result. Writing correct concurrent code can be difficult, and subtle bugs are easy to introduce unless adequate measures are taken.

The only functions in that provide extra safety in concurrent code are ,   and  , which are only of any help in very simple situations in which there is just one   meant to be used as a shared resource between computations. Concurrent Haskell code should take advantage of more sophisticated tools tailored for concurrency, such as s (mutable variables that a computation can make unavailable to the others for as long as necessary − see ) and  from the   package (an implementation of software transactional memory, a concurrency model which makes it possible to write safe concurrent code while avoiding the ugliness and complications of having to explicitly manage the availability of all shared variables).

The monad
In the  example above, mutability was imposed upon our code by external demands. However, in the two final scenarios suggested by the introduction (algorithms that require mutability and extreme computational demands) the need for mutable state is internal − that is, it is not reflected in any way in the overall results. For instance, sorting a list does not require mutability in any essential way, and so a function that sorts a list and returns a new list should, in principle, be functionally pure even if the sorting algorithm uses destructive updates to swap the position of the elements. In such case, the mutability is just an implementation detail. The standard libraries provide a nifty tool for handling such situations while still ending up with pure functions: the  monad, which can be found in.

looks a lot like, and indeed they are similar in spirit. An  computation is one that uses an internal state to produce results, except that the state is mutable. For that purpose, provides  s. A   is exactly like an , but it lives in the   monad rather than in.

There is one major difference that sets apart  from both   and. offers a  function with the following type:

At first, that is a shocking type signature. If  involves mutability, how come we can simply extract   values from the monad? The answer lies in the  part of the type. Having a  enclosed within the type of an argument amounts to telling the type checker "  could be anything. Don't make any assumptions about it". Not making any assumptions, however, means that  cannot be matched with anything else − even with the   from another invocation of   :

The overall effect of this type trickery is to insulate the internal state and mutability within each  computation, so that from the point of view of anything else in the program   is a pure function.

As a trivial example of  in action, here is a very imperative-looking version of   for lists :

For all intents and purposes,  is no less pure than the familiar. The fact that it destructively updates its accumulator  is a mere implementation detail, and there is no way information about   could leak other than through the final result. Looking at a simple example like this one makes it clear that the  type variable in   does not correspond to anything in particular within the computation − it is just an artificial marker. Another detail worth noting is that even though  folds the list from the right the sums are done from the left, as the mutations are performed as applicative effects sequenced from left to right.

Mutable data structures
Mutable data structures can be found in the libraries for the exceptional use cases for which they prove necessary. For instance, mutable arrays (alongside with immutable ones) can be found in the vector package or the array package bundled with GHC. There are also mutable hash tables, such as those from the hashtables package. In all cases mentioned, both  and   versions are provided.