Haskell/FFI

Using Haskell is fine, but in the real world there are a large number of useful libraries in other languages, especially C. To use these libraries, and let C code use Haskell functions, there is the Foreign Function Interface (FFI).

Marshalling (Type Conversion)
When using C functions, it is necessary to convert Haskell types to the appropriate C types. These are available in the  module; some examples are given in the following table.

The operation of converting Haskell types into C types is called marshalling (and the opposite, predictably, unmarshalling). For basic types this is quite straightforward: for floating-point one uses  (either way, as e.g. both   and   are instances of classes   and  ), for integers , and so on.

Calling a pure C function
A pure function implemented in C does not present significant trouble in Haskell. The  function of the C standard library is a fine example: First, we specify a GHC extension for the FFI in the first line. We then import the  and   modules, the latter of which contains information about , the representation of double-precision floating-point numbers in C.

We then specify that we are importing a foreign function, with a call to C. A "safety level" has to be specified with the keyword  (the default) or. In general,  is more efficient, and   is required only for C code that could call back a Haskell function. Since that is a very particular case, it is actually quite safe to use the  keyword in most cases. Finally, we need to specify header and function name, separated by a space.

The Haskell function name is then given, in our case we use a standard, but it could have been anything. Note that the function signature must be correct—GHC will not check the C header to confirm that the function actually takes a  and returns another, and writing a wrong one could have unpredictable results.

It is then possible to generate a wrapper around the function using  so that it looks exactly like any Haskell function.

Importing C's  is simple because it is a pure function that takes a plain   as input and returns another as output: things will complicate with impure functions and pointers, which are ubiquitous in more complicated C libraries.

Impure C Functions
A classic impure C function is, for the generation of pseudo-random numbers. Suppose you do not want to use Haskell's, for example because you want to replicate exactly the series of pseudo-random numbers output by some C routine. Then, you could import it just like  before:

If you try this naïve implementation in GHCI, you will notice that  is returning always the same value: > c_rand 1714636915 > c_rand 1714636915 indeed, we have told GHC that it is a pure function, and GHC sees no point in calculating twice the result of a pure function. Note that GHC did not give any error or warning message.

In order to make GHC understand this is no pure function, we have to use the IO monad:

Here, we also imported the  function, to be able to seed the C pseudo-random generator. > c_rand 1957747793 > c_rand 424238335 > c_srand 0 > c_rand 1804289383 > c_srand 0 > c_rand 1804289383

Working with C Pointers
The most useful C functions are often those that do complicated calculations with several parameters, and with increasing complexity the need of returning control codes arises. This means that a typical paradigm of C libraries is to give pointers of allocated memory as "targets" in which the results may be written, while the function itself returns an integer value (typically, if 0, computation was successful, otherwise there was a problem specified by the number). Another possibility is that the function will return a pointer to a structure (possibly defined in the implementation, and therefore unavailable to us).

As a pedagogical example, we consider the  function of the GNU Scientific Library, a freely available library for scientific computation. It is a simple C function with prototype:

The function takes a  x, and it returns its normalised fraction f and integer exponent e so that:
 * $$ x = f \times 2^e \qquad e \in \mathbb{Z}, \quad 0.5 \leq f < 1$$

We interface this C function into Haskell with the following code:

The new part is, which can be used with any instance of the   class, among which all C types, but also several Haskell types.

Notice how the result of the  function is in the   monad. This is typical when working with pointers, be they used for input or output (as in this case); we will see shortly what would happen had we used a simple  for the function.

The  function is implemented in pure Haskell code as follows:

We know that, memory management details aside, the function is pure: that's why the signature returns a tuple with f and e outside of the  monad. Yet, f is provided inside of it: to extract it, we use the function unsafePerformIO, which extracts values from the  monad: obviously, it is legitimate to use it only when we know the function is pure, and we can allow GHC to optimise accordingly.

To allocate pointers, we use the  function, which also takes responsibility for freeing memory. As an argument,  takes a function of type , and returns the. In practice, this translates to the following usage pattern with &lambda; functions:

The pattern can easily be nested if several pointers are required:

Back to our  function: in the &lambda; function that is the argument to , the function is evaluated and the pointer is read immediately afterwards with. Here we can understand why we wanted the imported C function  to return a value in the   monad: if GHC could decide when to calculate the quantity f, it would likely decide not to do it until it is necessary: that is at the last line when   uses it, and after e has been read from an allocated, but yet uninitialised memory address, which will contain random data. In short, we want  to return a monadic value because we want to determine the sequence of computations ourselves.

If some other function had required a pointer to provide input instead of storing output, one would have used the similar  function to set the pointed value, obviously before evaluating the function:

In the final line, the results are arranged in a tuple and returned, after having been converted from C types.

To test the function, remember to link GHC to the GSL; in GHCI, do: $ ghci frexp.hs -lgsl

(Note that most systems do not come with the GSL preinstalled, and you may have to download and install its development packages.)

Working with C Structures
Very often data are returned by C functions in form of s or pointers to these. In some rare cases, these structures are returned directly, but more often they are returned as pointers; the return value is most often an  that indicates the correctness of execution.

We will consider another GSL function,. This function provides the regular cylindrical Bessel function for a given order n, and returns the result as a  structure pointer. The structure contains two s, one for the result and one for the error. The integer error code returned by the function can be transformed in a C string by the function. The signature of the Haskell function we are looking for is therefore: where the first argument is the order of the cylindrical Bessel function, the second is the function's argument, and the returned value is either an error message or a tuple with result and margin of error.

Making a New Instance of the class
In order to allocate and read a pointer to a  structure, it is necessary to make it an instance of the   class.

In order to do that, it is useful to use the  program: we create first a Bessel.hsc file, with a mixed syntax of Haskell and C macros, which is later expanded into Haskell by the command: $ hsc2hs Bessel.hsc After that, we simply load the Bessel.hs file in GHC.

This is the first part of file Bessel.hsc:

We use the  directive to make sure   knows where to find information about. We then define a Haskell data structure mirroring the GSL's, with two s: this is the class we make an instance of. Strictly, we need only,   and   for this example;   is added for completeness.


 * is obviously fundamental to the allocation process, and is calculated by  with the   macro.
 * is the size in bytes of the data structure alignment. In general, it should be the largest  of the elements of the structure; in our case, since the two elements are the same, we simply use  's. The value of the argument to   is inconsequential, what is important is the type of the argument.
 * is implemented using a -block and the   macros, as shown.   and   are the names used for the structure fields in the GSL source code.
 * Similarly,  is implemented with the   macro.

Importing the C Functions
We import several functions from the GSL libraries: first, the Bessel function itself, which will do the actual work. Then, we need a particular function,, because the default GSL error handler will simply crash the program, even if called by Haskell: we, instead, plan to deal with errors ourselves. The last function is the GSL-wide interpreter that translates error codes in human-readable C strings.

Implementing the Bessel Function
Finally, we can implement the Haskell version of the GSL cylindrical Bessel function of order n.

Again, we use  because the function is pure, even though its nuts-and-bolts implementation is not. After allocating a pointer to a GSL result structure, we deactivate the GSL error handler to avoid crashes in case something goes wrong, and finally we can call the GSL function. At this point, if the  returned by the function is 0, we unmarshal the result and return it as a tuple. Otherwise, we call the GSL error-string function, and pass the error as a  result instead.

Examples
Once we are finished writing the  function, we have to convert it to proper Haskell and load the produced file: $ hsc2hs Bessel.hsc $ ghci Bessel.hs -lgsl

We can then call the Bessel function with several values: > besselJn 0 10 Right (-0.2459357644513483,1.8116861737200453e-16) > besselJn 1 0 Right (0.0,0.0) > besselJn 1000 2 Left "GSL error: underflow"

Advanced Topics
This section contains an advanced example with some more complex features of the FFI. We will import into Haskell one of the more complicated functions of the GSL, the one used to calculate the integral of a function between two given points with an adaptive Gauss-Kronrod algorithm. The GSL function is.

This example will illustrate function pointers, export of Haskell functions to C routines, enumerations, and handling pointers of unknown structures.

Available C Functions and Structures
The GSL has three functions which are necessary to integrate a given function with the considered method: The first two deal with allocation and deallocation of a "workspace" structure of which we know nothing (we just pass a pointer around). The actual work is done by the last function, which requires a pointer to a workspace.

To provide functions, the GSL specifies an appropriate structure for C: The reason for the  pointer is that it is not possible to define &lambda; functions in C, so the function cannot be partially applied with some general parameters independent of   and these parameters are therefore passed in a pointer of an unknown type. In Haskell, we do not need the  element, and will consistently ignore it.

Imports and Inclusions
We start our qag.hsc file with the following: We declare the  pragma, which we will use later for the   data type. Since this file will have a good number of functions that should not be available to the outside world, we also declare it a module and export only the final function  and the  - flags. We also include the relevant C headers of the GSL. The import of C functions for error messages and deactivation of the error handler was described before.

Enumerations
One of the arguments of  is , an integer value that can have values from 1 to 6 and indicates the integration rule. GSL defines a macro for each value, but in Haskell it is more appropriate to define a type, which we call. Also, to have its values automatically defined by hsc2hs, we can use the  macro: hsc2hs will search the headers for the macros and give our variables the correct values. The enum directive will define a function with an appropriate type signature for each of the enum values. The above example will get translated to something like this (with the C macros appropriately replaced by their values):

The variables cannot be modified and are essentially constant flags. Since we did not export the  constructor in the module declaration, but only the   flags, it is impossible for a user to even construct an invalid value. One thing less to worry about!

Haskell Function Target
We can now write down the signature of the function we desire: Note how the order of arguments is different from the C version: indeed, since C does not have the possibility of partial application, the ordering criteria are different than in Haskell.

As in the previous example, we indicate errors with a  result.

Passing Haskell Functions to the C Algorithm
We define a shorthand type,, for readability. Note that the  pointer has been translated to a , since we have no intention of using it. Then it is the turn of the  structure: no surprises here. Note that the  pointer is always assumed to be null, both in   and in , and is never really read nor written.

To make a Haskell  function available to the C algorithm, we make two steps: first, we re-organise the arguments using a &lambda; function in  ; then, in , we take the function with reordered arguments and produce a function pointer that we can pass on to  , so we can construct the   data structure.

Handling Unknown Structures
The reason we imported the  pragma is this: we are declaring the data structure   without providing any constructor. This is a way to make sure it will always be handled as a pointer, and never actually instantiated.

Otherwise, we normally import the allocating and deallocating routines. We can now import the integration function, since we have all the required pieces ( and  ).

The Complete Function
It is now possible to implement a function with the same functionality as the GSL's QAG algorithm.

First and foremost, we deactivate the GSL error handler, that would crash the program instead of letting us report the error.

We then proceed to allocate the workspace; notice that, if the returned pointer is null, there was an error (typically, too large size) that has to be reported.

If the workspace was allocated correctly, we convert the given function to a function pointer and allocate the  struct, in which we place the function pointer. Allocating memory for the result and its error margin is the last thing before calling the main routine.

After calling, we have to do some housekeeping and free the memory allocated by the workspace and the function pointer. Note that it would be possible to skip the bookkeeping using, but the work required to get it to work is more than the effort to remember one line of cleanup.

We then proceed to check the return value and return the result, as was done for the Bessel function.

Self-Deallocating Pointers
In the previous example, we manually handled the deallocation of the GSL integration workspace, a data structure we know nothing about, by calling its C deallocation function. It happens that the same workspace is used in several integration routines, which we may want to import in Haskell.

Instead of replicating the same allocation/deallocation code each time, which could lead to memory leaks when someone forgets the deallocation part, we can provide a sort of "smart pointer", which will deallocate the memory when it is not needed any more. This is called  (do not confuse with  : this one's qualified name is actually  !). The function handling the deallocation is called the finalizer.

In this section we will write a simple module to allocate GSL workspaces and provide them as appropriately configured s, so that users do not have to worry about deallocation.

The module, written in file GSLWorkspace.hs, is as follows:

We first declare our empty data structure, just like we did in the previous section.

The  and   functions will no longer be needed in any other file: here, note that the deallocation function is called with an ampersand ("&"), because we do not actually want the function, but rather a pointer to it to set as a finalizer.

The workspace creation function returns a IO (Maybe) value, because there is still the possibility that allocation is unsuccessful and the null pointer is returned. The GSL does not specify what happens if the deallocation function is called on the null pointer, so for safety we do not set a finalizer in that case and return ; the user code will then have to check for " -ness" of the returned value.

If the pointer produced by the allocation function is non-null, we build a foreign pointer with the deallocation function, inject into the  and then the   monad. That's it, the foreign pointer is ready for use!

The qag.hsc file must now be modified to use the new module; the parts that change are:

Obviously, we do not need the  extension here any more; instead we import the   module, and also a couple of nice-to-have functions from. We also remove the foreign declarations of the workspace allocation and deallocation functions.

The most important difference is in the main function, where we (try to) allocate a workspace, test for its  ness, and if everything is fine we use the   function to extract the workspace pointer. Everything else is the same.

Calling Haskell from C
Sometimes it is also convenient to call Haskell from C, in order to take advantage of some of Haskell's features which are tedious to implement in C, such as lazy evaluation.

We will consider a typical Haskell example, Fibonacci numbers. These are produced in an elegant, haskellian one-liner as:

Our task is to export the ability to calculate Fibonacci numbers from Haskell to C. However, in Haskell, we typically use the  type, which is unbounded: this cannot be exported to C, since there is no such corresponding type. To provide a larger range of outputs, we specify that the C function shall output, whenever the result is beyond the bounds of its integer type, an approximation in floating-point. If the result is also beyond the range of floating-point, the computation will fail. The status of the result (whether it can be represented as a C integer, a floating-point type or not at all) is signalled by the status integer returned by the function. Its desired signature is therefore:

Haskell Source
The Haskell source code for file <tt>fibonacci.hs</tt> is:

When exporting, we need to wrap our functions in a module (it is a good habit anyway). We have already seen the Fibonacci infinite list, so let's focus on the exported function: it takes an argument, two pointers to the target  and , and returns the status in the   monad (since writing on pointers is a side effect).

The function is implemented with input guards, defined in the  clause at the bottom. A successful computation will return 0, a partially successful 1 (in which we still can use the floating-point value as an approximation), and a completely unsuccessful one will return 2.

Note that the function does not call, since the pointers are assumed to have been already allocated by the calling C function.

The Haskell code can then be compiled with GHC: ghc -c fibonacci.hs

C Source
The compilation of <tt>fibonacci.hs</tt> has spawned several files, among which <tt>fibonacci_stub.h</tt>, which we include in our C code in file <tt>fib.c</tt>:

The notable thing is that we need to initialise the Haskell environment with, which we call passing it the command-line arguments of main; we also have to shut Haskell down with    when we are done. The rest is fairly standard C code for allocation and error handling.

Note that you have to compile the C code with GHC, not your C compiler! ghc -no-hs-main fib.c fibonacci.o -o fib

You can then proceed to test the algorithm: ./fib 42 F_42: 267914296 $ ./fib 666 Error: result is out of bounds Floating-point approximation: 6.859357e+138 $ ./fib 1492 Error: result is out of bounds Floating-point approximation is infinite ./fib -1 fib: Prelude.(!!): negative index