F Sharp Programming/Async Workflows

Async workflows allow programmers to convert single-threaded code into multi-threaded code with minimal code changes.

Defining Async Workflows
Async workflows are defined using computation expression notation:

Here's an example using fsi:

Notice the return type of. It does not actually run a function; instead, it returns an, which is a special kind of wrapper around our function.

The Async Module
The Async Module is used for operating on  objects. It contains several useful methods, the most important of which are:

 
 * Run the asynchronous computation and await its result. If an exception occurs in the asynchronous computation then an exception is re-raised by this function. Run as part of the default AsyncGroup.

 
 * Specify an asynchronous computation that, when run, executes all the given asynchronous computations, initially queueing each in the thread pool. If any raise an exception then the overall computation will raise an exception, and attempt to cancel the others. All the sub-computations belong to an AsyncGroup that is a subsidiary of the AsyncGroup of the outer computations.

 
 * Start the asynchronous computation in the thread pool. Do not await its result. Run as part of the default AsyncGroup

is used to run  blocks and wait for them to return,   automatically runs each   on as many processors as the CPU has, and   runs without waiting for the operation to complete. To use the canonical example, downloading a web page, we can write code for downloading a web page asyncronously as follows in fsi:

Members
objects are constructed from the AsyncBuilder, which has the following important members:

  /  
 * Specify an asynchronous computation that, when run, runs 'p', and when 'p' generates a result 'res', runs 'f res'.

  /  
 * Specify an asynchronous computation that, when run, returns the result 'v'

In other words,  executes an async workflow and binds its return value to an identifier,   simply returns a result, and   executes an async workflow and returns its return value as a result.

These primitives allow us to compose async blocks within one another. For example, we can improve on the code above by downloading a web page asyncronously and extracting its urls asyncronously as well:

Notice that  takes an   and binds its return value to an identifier of type. We can test this code in fsi:

What does  do?

runs an  object on its own thread, then it immediately releases the current thread back to the threadpool. When  returns, execution of the workflow will continue on the new thread, which may or may not be the same thread that the workflow started out on. As a result, async workflows tend to "hop" between threads, an interesting effect demonstrated explicitly here, but this is not generally regarded as a bad thing.

Parallel Map
Consider the function. This function is synchronous, however there is no real reason why it needs to be synchronous, since each element can be mapped in parallel (provided we're not sharing any mutable state). Using a module extension, we can write a parallel version of  with minimal effort:

Parallel mapping can have a dramatic impact on the speed of map operations. We can compare serial and parallel mapping directly using the following:

This program has the following types:

This program outputs the following: Start... (4276.190900 ms) Synchronous: [("http://www.craigslist.com/", 185); ("http://www.msn.com/", 262); ("http://en.wikibooks.org/wiki/Main_Page", 190); ("http://www.wordpress.com/", 132); ("http://news.google.com/", 296)] (1939.117900 ms) Asynchronous: [|("http://www.craigslist.com/", 185); ("http://www.msn.com/", 261); ("http://en.wikibooks.org/wiki/Main_Page", 190); ("http://www.wordpress.com/", 132); ("http://news.google.com/", 294)|] Done.

The code using  ran about 2.2x faster because web pages are downloaded in parallel, rather than serially.

Why Concurrency Matters
For the first 50 years of software development, programmers could take comfort in the fact that computer hardware roughly doubled in power every 18 months. If a program was slow today, one could just wait a few months and the program would run at double the speed with no change to the source code. This trend continued well into the early 2000s, where commodity desktop machines in 2003 had more processing power than the fastest supercomputers in 1993. However, after the publication of a well-known article, The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software by Herb Sutter, processors have peaked at around 3.7 GHz in 2005. The theoretical cap in in computing speed is limited by the speed of light and the laws of physics, and we've very nearly reached that limit. Since CPU designers are unable to design faster CPUs, they have turned toward designing processors with multiple cores and better support for multithreading. Programmers no longer have the luxury of their applications running twice as fast with improving hardware—the free lunch is over.

Clockrates are not getting any faster, however the amount of data businesses process each year grows exponentially (usually at a rate of 10-20% per year). To meet the growing processing demands of business, the future of all software development is tending toward the development of highly parallel, multithreaded applications which take advantage of multicores processors, distributed systems, and cloud computing.

Problems with Mutable State
Multithreaded programming has a reputation for being notoriously difficult to get right and having a rather steep learning curve. Why does it have this reputation? To put it simply, mutable shared state makes programs difficult to reason about. When two threads are mutating the same variables, it is very easy to put the variable in an invalid state.

Race Conditions

As a demonstration, here's how to increment a global variable using shared state (non-threaded version):

This works, but some programmer might notice that both calls to  could be computed in parallel since there's no real reason to wait for one call to finish before the other. Using the .NET threading primitives in the System.Threading namespace, a programmer can re-write this as follows:

This program should do the same thing as the previous program, only it should run in ~1/2 the time. Here are the results of 5 test runs in fsi:

The program is computationally sound, but it produces a different result everytime its run. What happened?

It takes several machine instructions increment a decimal value. In particular, the .NET IL for incrementing a decimal looks like this:

Imagine that we have two threads calling this code (calls made by Thread1 and Thread2 are interleaved):

Thread1: Loads value "100" onto its evaluation stack. Thread1: Call add with "100" and "1" Thread2: Loads value "100" onto its evaluation stack. Thread1: Writes "101" back out to static variable Thread2: Call add with "100" and "1" Thread2: Writes "101" back out to static variable (Oops, we've incremented an old value and wrote it back out) Thread1: Loads value "101" onto its evaluation stack. Thread2: Loads value "101" onto its evaluation stack.

(Now we let Thread1 get a little further ahead of Thread2) Thread1: Call add with "101" and "1" Thread1: Writes "102" back out to static variable. Thread1: Loads value "102" to evaluation stack Thread1: Call add with "102" and "1" Thread1: Writes "103" back out to static variable. Thread2: Call add with "101" and "1 Thread2: Writes "102" back out to static variable (Oops, now we've completely overwritten work done by Thread1!)

This kind of bug is called a race condition and it occurs all the time in multithreaded applications. Unlike normal bugs, race-conditions are often non-deterministic, making them extremely difficult to track down.

Usually, programmers solve race conditions by introducing locks. When an object is "locked", all other threads are forced to wait until the object is "unlocked" before they proceed. We can re-write the code above using a block access to the  variable while each thread writes to it:

The lock guarantees each thread exclusive access to shared state and forces each thread to wait on the other while the code  runs to completion. This function now produces the expected result.

Deadlocks

Locks force threads to wait until an object is unlocked. However, locks often lead to a new problem: Let's say we have ThreadA and ThreadB which operate on two corresponding pieces of shared state, StateA and StateB. ThreadA locks stateA and stateB, ThreadB locks stateB and stateA. If the timing is right, when ThreadA needs to access stateB, its waits until ThreadB unlocks stateB; when ThreadB needs to access stateA, it can't proceed either since stateA is locked by ThreadA. Both threads mutually block one another, and they are unable to proceed any further. This is called a deadlock.

Here's some simple code which demonstrates a deadlock:

These kinds of bugs occur all the time in multithreaded code, although they usually aren't quite as explicit as the code shown above.

Why Functional Programming Matters
To put it bluntly, mutable state is enemy of multithreaded code. Functional programming often simplifies multithreading tremendously: since values are immutable by default, programmers don't need to worry about one thread mutating the value of shared state between two threads, so it eliminates a whole class of multithreading bugs related to race conditions. Since there are no race conditions, there's no reason to use locks either, so immutability eliminates another whole class of bugs related to deadlocks as well.