OpenMP/Reductions

Summing floating point numbers
For our first parallel program, we turn to an age-old problem: summing an array of floating point numbers. The basic algorithm to solve this problem is so simple that it allows us to focus on OpenMP features rather than algorithmic details, but we'll see in a bit that the problem is actually less trivial than it appears at first.

Without further ado, here's a sequential algorithm to sum a list of floating point numbers:

As far as algorithms go, this one is as simple as it gets. Put the definition above in a file  (iterative sum).


 * If you have experience dealing with floating point numbers, you might be tempted to make  a   instead of a   for added precision. Don't do that just yet, as we'll solve the precision issue in a different way in a bit.

To test our algorithm, we need a driver program, which we'll put in a file :

And finally we need some way to build this program. On Linux/Unix/OS X, the following  should do the job. It assumes you're using GCC.

Now compile the program with, run it and see how fast it is with a tool such as. If it's too fast to measure, consider changing  to a larger number, or run   in a loop instead of just once.

A parallel way of summing
We had to go through a bit of setup, but now we're ready to make a parallel sum algorithm for floating point numbers. Here it is:

turns the loop into a parallel loop. If you have two cores, OpenMP will (probably) use two threads that each run half of the loop. The  declares that we're reducing the input array by summing into the variable , so after the partial loops are done, their results must be summed into this variable.

Put this in  and recompile. Now run the program. Do you get the same output as before?


 * Exercise: run the program with various settings for the environment variable, which controls the size of the thread pool that OpenMP uses. Try 1, 2, 4 and 8. Do you see the same results for each setting? Now try absurdly large number of threads, e.g. 16000. How does this affect performance?


 * Exercise: the dot product of two vectors is the sum of products of their respective entries, $$\sum_{i=1}^n a_i b_i$$. Adapt the  function to a   function for computing dot products in parallel.