Cg Programming/Programmable Graphics Pipeline

The programmable graphics pipeline presented here is very similar to the OpenGL (ES) 2.0 pipeline, the WebGL pipeline, and the Direct3D 8.0 pipeline. As such it is the lowest common denominator of programmable graphics pipelines of the majority of today's desktop PCs and mobile devices.

Parallelism in Graphics Pipelines
GPUs are highly parallel processors. This is the main reason for their performance. In fact, they implement two kinds of parallelism: vertical and horizontal parallelism:
 * Vertical parallelism describes parallel processing at different stages of a pipeline. This concept was also crucial in the development of the assembly line at Ford Motor Company: many workers can work in parallel on rather simple tasks. This made mass production (and therefore mass consumption) possible. In the context of processing units in GPUs, the simple tasks correspond to less complex processing units, which save costs and power consumption.
 * Horizontal parallelism describes the possibility to process work in multiple pipelines. This allows for even more parallelism than the vertical parallelism in a single pipeline. Again, the concept was also employed at Ford Motor Company and in many other industries. In the context of GPUs, horizontal parallelism of the graphics pipeline was an important feature to achieve the performance of modern GPUs.

The following diagram shows an illustration of vertical parallelism (processing in stages represented by boxes) and horizontal parallelism (multiple processing units for each stage represented by multiple arrows between boxes).

In the following diagrams, there is only one arrow between any two stages. However, it should be understood that GPUs usually implement the graphics pipeline with massive horizontal parallelism. Only software implementations of the graphics pipeline, e.g. Mesa 3D (see the Wikipedia entry), usually implement a single pipeline.

Programmable and Fixed-Function Stages
The pipelines of OpenGL ES 1.x, core OpenGL 1.x, and Direct3D 7.x are configurable fixed-function pipelines, i.e. there is no possibility to include programs in these pipelines. In OpenGL (ES) 2.0, WebGL, and Direct3D 8.0, two stages (the vertex shader and the fragment shader stage) of the pipeline are programmable, i.e. small programs (shaders) written in Cg (or another shading language) are applied in these stages. In the following diagram, programmable stages are represented by green boxes, fixed-function stages are represented by gray boxes, and data is represented by blue boxes.

The vertex shader and fragment shader stages are discussed in more detail in the platform-specific tutorials. The rasterization stage is discussed in and the per-fragment operations in.

The primitive assembly stage mainly consists of clipping primitives to the view frustum (the part of space that is visible on the screen) and optional culling of front-facing and/or back-facing primitives. These possibilities are discussed in more detail in the platform-specific tutorials.

Data Flow
In order to program Cg vertex and fragment shaders, it is important to understand the input and ouput of each shader. To this end, it is also useful to understand how data is communicated between all stages of the pipeline. This is illustrated in the next diagram:

Vertex input parameters are defined based on the vertex data. For each vertex input parameter a semantic has to be defined, which specifies how the parameter relates to data in the fixed-function pipeline. Examples of semantics are,  ,  ,  ,  , etc. This makes it possible to use Cg programs even with APIs that were originally designed for a fixed-function pipeline. For example, the vertex input parameter for vertex positions should use the  semantic such that all APIs can provide the appropriate data for this input parameter. Note that the vertex position is in object coordinates, i.e. this is the position as specified in a 3D modeling tool.

Uniform parameters (or uniforms) have the same value for all vertex shaders and all fragment shaders that are executed when rendering a specific primitive (e.g. a triangle). However, they can be changed for other primitives. Usually, they have the same value for a large set of primitives that make up a mesh. Typically, vertex transformations, specifications of light sources and materials, etc. are specified as uniforms.

Vertex output parameters are computed by the vertex shader, i.e. there is one set of values of these parameters for each vertex. A semantic has to be specified for each parameter, e.g.,  ,  ,  ,  , etc. Usually, there has to be an output parameter with the semantic   or  , which determines where a primitive is rendered on the screen (“SV” stands for “system value” and can have a special meaning). The size of point primitives can be specified by an output parameter with the semantic. Other parameters are interpolated (see ) for each pixel covered by a primitive.

Fragment input parameters are interpolated from the vertex output parameters for each pixel covered by a primitive. Similar to vertex output parameters, a semantic has to be specified for each fragment input parameter. These semantics are used to match vertex output parameters with fragment input parameters. Therefore, the names of corresponding parameters in the vertex shader and fragment shader can be different as long as the semantics are the same.

Fragment output parameters are computed by fragment shaders. A semantic has to be specified, which determines how the value is used in the following fixed-function pipeline. Most fragment shaders specify an output parameter with the semantic. The fragment depth is computed implicitly even if no output parameter with the semantic  is specified.

Texture data include a uniform sampler, which specifies the texture sampling unit, which in turn specifies the texture image from which colors are fetched.

Other data is described in the tutorials for specific platforms.