Cg Programming/Unity/Computing Image Effects

This tutorial covers the basic steps to create a minimal compute shader in Unity for image post-processing of camera views. If you are not familiar with image effects in Unity, you should read first. Note that compute shaders are not supported on macOS.

Compute Shaders in Unity
Compute shaders are in some respects similar to fragment shaders and it is sometimes helpful to think of them as “improved” fragment shaders because compute shaders address some problems of fragment shaders:


 * Fragment shaders are part of the graphics pipeline, which makes it cumbersome to use them for anything else, in particular GPGPU programming (General-Purpose computing on Graphics Processing Units).
 * Fragment shaders are designed for the embarrassingly parallel problem of rasterizing the fragments of triangles (and other geometric primitives). Thus, they are not well suited for problems that are not embarrassingly parallel, e.g., when shaders have to share or communicate data between themselves or need to write to an arbitrary location in memory.
 * The graphics hardware that runs fragment shaders offers features for more advanced parallel programming but it was not considered wise to offer those features in fragment shaders; thus, a different application programming interface (API) was considered necessary.

Historically, the first approach to solve these shortcomings of fragment shaders was to introduce completely new APIs, e.g., CUDA, OpenCL, etc. While some of these APIs are still very popular for GPGPU programming, they are less popular for graphics tasks (e.g., image processing) because of several reasons. (One reason being the overhead of using two APIs (compute and graphics) for the same hardware; another reason being the difficulties of communicating data between the compute API and the graphics API.)

Due to the problems of separate compute APIs, compute shaders were introduced in graphics APIs (in particular Direct3D 11, OpenGL 4.3, and OpenGL ES 3.1) as another class of shaders. This is also what Unity supports.

In this tutorial, we look at how to use compute shaders in Unity for image processing in order to introduce the basic concepts of compute shaders as well as specific issues of using compute shaders for image processing, which is an important application area. Further tutorials discuss more advanced features of compute shaders and applications apart from image processing.

Creating a Compute Shader
Creating a compute shader in Unity is not complicated and very similar to creating any shader: In the Project Window, click on Create and choose Shader > Compute Shader. A new file named “NewComputeShader” should appear in the Project Window. Double-click it to open it (or right-click and choose Open). A text editor with the default shader in DirectX 11 HLSL should appear. (DirectX 11 HLSL is different from Cg but it shares many common syntax features.)

The following compute shader is useful to tint an image with a user-specified color. You can copy&paste it into the shader file:

Let's go through this shader line by line: The (Unity-specific) line  defines the compute shader function; this is very similar to   for fragment shaders.

The line  defines a uniform variable that is set in a script as described below. This is just like a uniform variable in a fragment shader. In this case,  is used to tint the images.

The line  defines a 2D texture with four floating-point components such that the compute shader can read it (without interpolation). In a fragment shader, you would use  to sample a 2D texture (with interpolation). (Note that HLSL uses separate texture objects and sampler objects; see Unity's manual for how to define a sampler object for a given texture object, which you would need if you want to sample a 2D texture with interpolation in a compute shader using the function .)

specifies a read/write 2D texture, which the compute shader can read from and write to. This corresponds to a render texture in Unity. A compute shader can write to any position in a, while a fragment shader can usually write only to the position of its fragment. Note, however, that multiple threads of the compute shader (i.e., calls of the compute shader function) might write to the same location in an undefined order which results in undefined results unless special care is taken to avoid these problems. In this tutorial we avoid any of these problems by letting each thread only write to its own, unique location in the.

The next line is. This is a special line for compute shaders, which defines the dimensions of a thread group. A thread group is a group of calls to the compute shader function that are executed in parallel and, therefore, their execution can be synchronized, i.e., one can specify barriers (with functions like ) that all threads in the thread group have to reach before any of the threads may be executed further. Another feature of a thread group is that all threads within one thread group may share some particularly fast (" ") memory, while the memory that may be shared by threads in different groups is usually slower.

The threads are organized in a 3D array of thread groups and each thread group is itself a 3D array with the three dimensions specified by the three arguments of. For image processing tasks, the third (z) dimension is usually 1 as in our example. The dimensions (8,8,1) specify that each thread group consists of 8 × 8 × 1 = 64 threads. (For an illustration see Microsoft's documentation of numthreads.) There are certain platform-specific limitations on these numbers, e.g., for Direct3D 11 the x and y dimension must be less than or equal to 1024 and the z dimension must be less than or equal to 64, and the product of the three dimensions (i.e., the size of the thread group) must be less than or equal to 1024. On the other hand, thread groups should have a minimum size of about 32 (depending on the hardware) for best efficiency.

As described below, the compute shader is called in a script with the function, where  specifies the compute shader function and the other arguments specify the dimensions of the 3D array of thread groups. For our example of, there are 64 threads in each group, thus, the total number of threads would be.

The rest of the code specifies the compute shader function. Usually, it is important for the compute shader function to know for which position in the 3D array of threads it was called. It might also be important to know the position of the thread group in the 3D array of thread groups, as well as the position of the thread within the thread group. HLSL offers the following semantics for this information:
 * : a  vector that specifies the 3D ID of the thread group; each coordinate of the ID starts at 0 and goes up to (but excluding) the dimension specified in the   call.
 * : a  vector that specifies the 3D ID of a thread within a thread group; each coordinate of the ID starts at 0 and goes up to (but excluding) the dimension specified in the   line.
 * : a  that specifies the flattened/linearized   between 0 and.
 * : a  vector that specifies the 3D ID of the thread in the whole array of all thread group. It's equal to.

The compute shader function can receive any of these values as in the example:.

The particular function  actually uses only the variable   with the semantic. The function calls are organized in a 2D array of (at least) the dimensions of the  and   texture; thus,   and   can be used to access these texels   and. The basic operation is just to multiply the color of the  texture with   and write it to the   render texture:

Applying the Compute Shader to the Camera View
In order to apply the compute shader to all pixels of the camera view, we have to define the function  and use these render textures in the compute shader. There are, however, some problems; specifically in newer Unity versions, we have to copy the source pixels to a temporary texture before we can use them in a compute shader. Furthermore, if Unity renders directly to the frame buffer,  is set to   and we have no render texture to use for our compute shader. Also, we need to enable the render texture for random write access before we create it, which we cannot do with the render textures that we get in. We can handle these cases (and cases where the  and   render textures have different dimensions) by creating a temporary render texture of the same dimensions as the   render texture and letting the compute shader write to that temporary render texture. The result can then be copied to the  render texture, which might be   in which case the result is copied to the frame buffer.

The following C# script implements this idea with the temporary render texture : The script should be saved as "tintComputeScript.cs". To use it, it has to be attached to a camera and the public variable  has to be set to a compute shader, for example, the one we have defined above.

The  function of the script does only some error checking, gets the number of the compute shader function with , and writes it to   for use in the   function.

The  function releases the temporary render texture because the garbage collector does not automatically release the hardware resources that are necessary for render textures.

The  function does some error checking, then — if necessary — it creates new render textures in   and   and copies the pixels to , and after that it sets all the uniform variables of the compute shader with the functions  ,   and   before it calls the compute shader function with a call to. In this case we use  times   thread groups (both numbers implicitly rounded down). We divide by 8 in both dimensions because we specify the number of thread groups and each thread group has the size 8 times 8 as specified by  in the compute shader. The addition of 7 is required to make sure that we are not short by one if the dimensions of the render texture are not divisible by 8. After dispatching the compute shader, the result is copied from  to the actual   of   with the help of a call to.

Comparison with Fragment Shaders for Image Effects
This compute shader and C# script implement the same effect as the fragment shader in. Apparently more code is necessary for an image effect with a compute shader than for an image effect with a fragment shader. However, you should remember two things: 1) the reason for the additional code is mainly that Unity's  function and   function were designed to work smoothly with fragment shaders while compute shaders were not considered when these functions were defined, and 2) the compute shader is able to do things that fragment shaders cannot do, e.g., writing to arbitrary positions in the destination render texture, sharing data between threads, synchronizing the execution of threads, etc. Some of these features are discussed in other tutorials.

Summary
Congratulations, you have learned the basics about compute shaders in Unity and how to use them for image effects. A few of the things you have seen are:
 * How to create a compute shader for an image effect.
 * How to set the uniform variables of a compute shader in a C# script.
 * How to call the compute shader function with the  function.