Cg Programming/Unity/Computing Color Histograms



This tutorial shows how to compute a color histogram of an image with the help of compute shaders in Unity. In particular, it shows how to use an atomic function such that multiple threads (i.e., multiple calls to a compute shader function) can access the same memory location. It also shows how to use compute buffers. If you are not familiar with compute shaders in Unity, you should read first. Note that compute shaders are not supported on macOS.

Computing Color Histograms in General
An RGB color histogram of an image is a bar chart that shows for each value of the red, green, and blue channel, how many pixels of the image feature that value. For example, how many pixels have a red value of 0, how many pixels have a green value of 0, etc. For a color resolution of 8 bits, there are 256 possible values (0 to 255) of the red, green, and blue channels; thus, a RGB color histogram specifies 3 × 256 = 768 numbers. If an alpha channel is also included, the RGBA color histogram consists of 4 × 256 = 1024 numbers.

To compute such an RGBA color histogram, a program would first initialize the 1024 numbers of the histogram to 0. Then it looks at each pixel of the image and increment (by 1) the four numbers in the histogram for the specific red, green, blue, and alpha values of the pixel. Since the same operations are performed for each pixel, this problem is easy to parallelize, except that two different threads for two different pixels might try to increment the same number of the histogram at the same time, which can lead to problems that are called race conditions. These problems can be avoided if the operation to increment one of the numbers of the histogram is an atomic operation, i.e., if it cannot be interrupted by other threads. This is what we use in the compute shader of this tutorial.

The Big Picture: Calling the Compute Shader
In this tutorial, we start with the C# script that calls the compute shader because it provides the bigger picture. Note that we compute color histograms for any texture image; not only for camera views as in. Thus, you can attach this script to any.

The script defines three public variables:  which has to be set to the compute shader that is shown below;   which has to be set to the texture for which the histogram should be computed; and   which the script sets to an array of 1024 unsigned ints of the compute histogram.

The three private variables are:  which contains the same data as   but can be accessed by the compute shader;   and   are the indices of the two compute shader functions for the main processing of all pixels and for the initialization of the 1024 numbers of the histogram.

The  function sets the two handles with   and creates the   compute buffer and the   array. While the compute buffer is created as an array of 256 elements that each contain 4 unsigned ints, the  is created as an array of 1024 unsigned ints. This difference does not matter since the memory layout is the same for both. Of course, the  could also be defined as an array of 256 structs that each contain 4 unsigned ints. The rest of the  function does error checking and sets the texture and compute buffer to the corresponding uniform variables for each compute shader function such that they have access to them.

The  function simply releases the compute buffer since the hardware resources attached to it are not automatically released by the garbage collector.

The  function does some error checking and then calls the compute shader function for the initialization of the   and the compute shader function for processing all the pixels. For the initialization, we use 4 (= 256 / 64) thread groups of 64 × 1 × 1 threads to initialize the 256 elements of the compute buffer. For the main processing of the pixels we use thread groups of 8 × 8 × 1 threads and compute the number of thread groups by dividing the dimensions of the texture image by 8. The addition of 7 is necessary to make sure that we are not short by one thread group if the dimensions are not divisible by 8. Lastly, the  function calls   to copy the data from the compute buffer to the Unity array in  ; note that the two data structures have to have the same memory layout for this call to work.

At the end of each frame, the computed color histogram is available in the public variable ; thus, you can look it in the Inspector Window while running the program.

The Nitty-Gritty Details of the Compute Shader
In this case, the compute shader contains two compute shader functions, one for the initialization and the other one for the main processing of the texels of the texture. Therefore, it also includes two  instructions and two   instructions:

As always, you create a compute shader by clicking on Create in the Project Window and choosing Shader > Compute Shader. You should then copy&paste the code into the new file.

The first two lines  and   specify the two compute shader functions (“kernels”) that can be called from a script with the   function.

specifies a uniform variable for a read-only 2D RGBA texture with name.

defines a small structure with only one member: a 4D unsigned int vector called. is used to count the red pixels with a certain value (according to the position in the array); and analogously,  , and   for the green, blue, and alpha channel.

The structure  is then used in   to define a read/write structured buffer that represents the compute buffer   in the C# script. The memory layout matches because the elements of the  is of type , which consists of 4 uints.

The function  uses thread groups of dimensions 64 × 1 × 1, which means that the argument   runs from   to   since we use 4 thread groups. Therefore, the function can use  to index the 256 elements of the   when initializing all elements to 0.

The function  uses thread groups of dimensions 8 × 8 × 1. Since we base the number of thread groups on the texture size, the function can use the argument  to access the texels of the texture with. Since the RGBA values are read as floating-point values between 0.0 and 1.0, they are multiplied with 255.0 and rounded down by converting them to unsigned ints in the  variable. The RGBA values in  are then used to index the   to increment the counter variables in the buffer, i.e.,   for the red value,   for the green value, etc.

To increment the counter variables, the code uses the function  which takes a variable as first argument and an integer as second argument. In our case, the latter is 1 because we increment by 1. is one of the atomic functions of HLSL compute shaders; i.e., the GPU makes sure that any race conditions due to multiple threads trying to increment the same variable at the same time are avoided. There are a couple of atomic functions in HLSL; note that all of them work only with integers or unsigned integers.

If you want to observe the effect of the race conditions, you can replace the calls to the atomic function  by code like this: On most GPUs, this will not be an atomic operation and, therefore, there will usually be race conditions when you run this code, which lead to undefined results. You might be able to observe in the Inspector Window that the values in the  array change somewhat randomly due to these race conditions.

Summary
You have reached the end of this tutorial! A few of the things that you have learned are:
 * What color histograms are and how to compute them.
 * How to create and use Unity's compute buffers in a C# script and how to define a corresponding read/write structured buffer in a compute shader.
 * How to define and use multiple compute shader functions in one compute shader.
 * How to use an atomic function in a compute shader.