![]() |
VPP
0.7
A high-level modern C++ API for Vulkan
|
In section How do C++ shaders work we have explained how the C++ shaders work internally. Here is a list of typical elements and constructs found in these shaders. Consult individual pages for these items for more information.
The first thing that is usually found in each VPP shader is the using
directive, for easy access to VPP types:
Some binding points require accessors - objects declared in the shader code. But not all, as there exist binding points accessed directly.
Vertex and instance data are supplied via vpp::inVertexData binding points. They are read-only and accessed directly: Example:
Uniform buffers are bound via vpp::inUniformBuffer or vpp::inUniformBufferDyn binding points. They are untyped, read-only and require an accessor which provides the type. The accessor can be one of the following:
The example below shows vpp::UniformVar. The remaining ones have identical syntax, but require one more level of indirection (the [] indexing operator) to access individual array element.
Storage buffers are similar to uniform buffers, but they are read-write. Their binding point types are vpp::ioBuffer and vpp::ioBufferDyn. Other details are the same. An example (showing simple array variant this time):
Texel buffers are a hybrid of images and buffers. They are one-dimensional, can hold only arrays of simple data and are accessed via image functions (e.g. ImageStore
or TexelFetch
). VPP also provides the TexelArray accessor to allow using these buffers like regular buffers, with indexing operator instead of function calls. This is similar to vpp::UniformSimpleArray accessor. Corresponding binding points are vpp::inTextureBuffer and vpp::ioImageBuffer.
Caution: differently than uniform buffers, texel buffers require vpp::TexelBufferView objects to be explicitly constructed and stored along with corresponding buffers. Forgetting to do so will result in undefined behavior of texel buffers (sometimes work, sometimes not), currently the validation layer does not detect this. In order to make this error harder to occur, constructors of vpp::TexelBufferView are explicit.
Push constants are accessed in shaders in the same way as single-structure uniform buffers. They are actually small read-only uniform buffers with implicitly allocated memory.
Push constants are written on the CPU side directly. In the example above, use the data()
method to obtain a reference to CConstantData
structure, and write the values directly (like in setField1
method). To transfer the structure value, use cmdPush
method on the vpp::inPushConstant object.
This should be done from a command sequence (lambda function). The cmdPush
makes a local copy of the entire structure and schedules a command for the GPU to update its own copy of the structure using the local one as the source. These values will be visible in shaders launched by subsequent draw or computation commands.
This way you can actually schedule multiple structure updates from single command sequence. Each cmdPush
memorizes whatever values were set in the vpp::inPushConstant::data() object at the time of calling and generates a command to be executed later, to set exactly these values. So the following pattern is possible:
Inter-shader variables allow to pass some data from one shader to the next shader in pipeline. This always must be strictly following shader, skipping shaders is not allowed.
The binding point is vpp::ioVariable or vpp::ioStructure. Provide the data type as well as source and target shaders for the template.
These binding points need an accessor, to tell whether we are writing to the variable or reeading from it. These accessors are named vpp::Output and vpp::Input respectively.
Passing the data to the fragment shader (from any other shader type) involves automatic interpolation, with the exception of integer data.
Example:
Textures are read-only images working together with a sampler. Binding points vpp::inConstSampledTexture and vpp::inSampledTexture expose textures already associated with a sampler, so there is no need to worry about it in the shader.
In order to read from a texture, use the vpp::Texture function or any other function from Texture
family (there are a lot of them). All these functions require coordinates, and sometimes other arguments. Textures do not need accessors. An example:
Storage images are images accessed without sampling. Individual pixels are being read or written directly. Binding point type associated with storage images is vpp::ioImage. There are a number of functions which take this binding point as an argument. No accessors are needed. Example:
The above example copies pixels from one image to another. vpp::ImageSize() can be used to retrieve the size of the image. vpp::ImageLoad() reads a pixel, vpp::ImageStore() writes it. Coordinates are expressed as an integer vector with number of components equal to image dimensionality (plus one, if the image is arrayed).
Input attachments allow to receive data from another vpp::Process node in the rendering graph. An input attachment in target process is simultaneously output attachment in the source process. VPP and Vulkan automatically maintain a dependency between the processes.
Input attachments are accessed (read-only) via vpp::inAttachment binding point.
Each input attachment requires allocation of image and image view. The image should have vpp::Img::INPUT usage bit set, as well as one of output attachment usage bits (vpp::Img::COLOR or vpp::Img::DEPTH). The vpp::inAttachment template should be parameterized with the view type, just like texture or storage image.
Reading of input attachments may occur only in fragment shaders. No other shader type is allowed. Reading is being performed by means of vpp::SubpassLoad() function. Coordinates are relative to current pixel position in particular fragment shader call. Usually call the overload without arguments, as it supplies zero offset and simply reads current pixel.
Reading is fully synchronized by Vulkan. The pixel value you read is guaranteed to be final pixel value generated by preceding process. This is true even if the pixel is being written multiple times or constructed incrementally (via blending or logical operations).
Example:
Samplers can be associated with images in static or dynamic way. Dynamic samplers are pipeline objects just as images themselves. They have binding points and must be bound to shader blocks.
Binding point type for samplers is vpp::inSampler. This is a template which takes either vpp::NormalizedSampler or vpp::UnnormalizedSampler type.
In shaders, the only thing to do with these samplers is to associate them with texture image represented by vpp::inTexture binding point. This is being done by calling the vpp::MakeSampledTexture() function, as in the example below. This function returns some opaque type that is allowed to be used in vpp::Texture and other texture reading functions.
vpp::MakeSampledTexture() must not be called within conditional blocks.
Example:
Another variant of samplers, "semi-dynamic", are represented with vpp::inConstSampler binding points. They use the same syntax as vpp::inSampler, but are not bound to shader data blocks. Instead, these binding points require corresponding samplers to be passed directly to the constructor.
Example:
Output attachments are accessed though vpp::outAttachment binding points. There is only one thing you can do with them - write current pixel value. This is accomplished just by using the assignment (=
) operator. This syntax is different than most other resources, but simpler.
An example:
Arrays of binding points are created by means of vpp::arrayOf template. Individual points are then referenced ny using extra bracket operator [], as shown in section Arrays of buffers.
Immutable variables can be declared and initialized, but may not be changed later. On the other hand, they are very easily optimized and contribute to very efficient code. In most algorithms, there is a need for only a few variables that are mutable (e.g. loop control), the rest of them may be immutable.
And example below shows several immutable variables. Using const
specifier is optional, they are immutable regardless you declare them const
or not.
Mutable variables can be changed at will. But the price to pay for that is high. For each mutable variable, the shader compiler must allocate a permanent register on GPU and must do it for each concurrent thread. Typical GPU these days can run at least a thousand of threads simultaneously, and the register pool is several thousands (later or more expensive GPUs may have more). The register pool can be quickly exhausted by using more than 8-10 simple mutable variables per thread. When there is no more registers, the compiler will allocate regular memory and that will slow down your shader 10 times.
Therefore use immutable variables by default. Declare mutable ones only if needed. Reuse them throughout your shader. Note that C++ optimizing compiler will not be able to optimize usage of these variables, so you must do it yourself.
You can also declare mutable arrays of fixed size. This is done by means of vpp::VArray template. Specify item type as the first parameter and size as the second. All remarks about efficiency concern arrays as well - so declare only small arrays.
Some trivial examples:
The GPU runs large number of threads in parallel (1000 or more). Those threads are logically organized into workgroups. A workgroup is smaller group of threads (typically 32 or 64) that run on single Computation Unit on the GPU.
As of writing this (2018), contemporary GPU architectures became very similar to multicore SIMD CPU architectures, like regular Intel or AMD processors. A regular processor has e.g. 8 cores and 8-way SIMD in single core (AVX and AVX2). Now imagine a processor with 32-way SIMD and 32 cores – this is roughly an entry-level GPU. Each core also has its own data/instruction caches and GPR register pool. An unique feature of GPUs is allowing to control explicitly what variables are shared over entire core. These variables are called shared variables.
Shared variables have two distinguishing traits:
To declare a variable as shared inside shader code, use the vpp::Shared() function. Place it before the declaration, like this:
Shared arrays are also possible and they are much less performance-sensitive. Actually shared arrays are very practical method for data exchange between threads in a single group, or temporary data storage when you can ensure that different threads do not access same fields simultaneously (otherwise consider using atomic variables, described in next section).
Built-in variables are very important aspect of shader code authoring. These variables are predefined for each shader type and are accessible via the shader object given as the parameter to the shader. For example:
Each shader type has its own object representation: vpp::VertexShader, vpp::GeometryShader, vpp::TessControlShader, vpp::TessEvalShader, vpp::FragmentShader, and vpp::ComputeShader. Each of these objects define its own set of built-in variables, specific to that shader type. Shader objects also offer some methods (described elsewhere).
Built-in variables have different types. See the docs of each shader object for the list of variables and their types.
Some examples:
Vectors and arrays are important part of programming, both CPU and GPU. VPP offers overloaded indexing operators ([]) allowing to access individual elements of various aggregates: local arrays, buffers, arrayed built-in variables, vectors, matrices, structures, etc. Subsequent sections describe them in more detail.
Local arrays are described in section Mutable variables and arrays. The indexing operator for them accepts any integer expression (variable or constant). Examples:
Immutable or mutable vectors of types like vpp::Vec2, vpp::Vec3, vpp::Vec4, vpp::IVec2, vpp::IVec3, vpp::IVec4, vpp::VVec2, vpp::VVec3, vpp::VVec4, vpp::VIVec2, vpp::VIVec3, vpp::VIVec4, etc. can be indexed much like local arrays. Any integer expression can be used inside the [] operator. The index is zero based, like in arrays. The value must be less than vector size, otherwise the result is undefined. For mutable vectors, assignment to indexed locations is permitted.
Examples:
Swizzles are special method of indexing vectors, involving using of component names rather than numeric indices. Entire slices of vectors can be easily specified by these names, for example:
ivec4 [ XYZW ]
means a vector formed from the following components of the original vector: 0, 1, 2, 3. Actually it is identical to the source vector.ivec4 [ WZYX ]
means a vector formed from the following components of the original vector: 3, 2, 1, 0. The component order is reversed.ivec4 [ YYYY ]
means a vector formed from the following components of the original vector: 1, 1, 1, 1. This is just the Y
component copied into remaining components.ivec4 [ XYZ ]
is a 3-element vector built from 3 lower elements of the source vector. This one is very often used to truncate 4D vector to 3D vector in shaders.ivec4 [ W ]
is a scalar equal to the last element of the source vector.These are just a few examples. Any combination of letters X
, Y
, Z
, W
may be used. The length of the combination must be equal or less than the size of original vector. Swizzles can be used for any vector length and component type.
Swizzle names are defined as enumeration types: vpp::ESwizzle1, vpp::ESwizzle2, vpp::ESwizzle3, vpp::ESwizzle4. Because of that, they require either a using namespace vpp;
directive, or explicit qualification with vpp::
prefix.
Swizzles can be combined with other indexing operators, but the swizzle either must be the last one of them in chain, or all indices coming after the swizzle must be constants or CPU variables.
Example:
Swizzles can also be used to write components, i.e. on the left side of the assignment operator. Examples:
Matrix variables can also be mutable or immutable. Some examples of immutable matrix types: vpp::Mat2, vpp::Mat3, vpp::Mat4, vpp::Mat4x2, vpp::Mat3x4, vpp::IMat4x2, vpp::UMat2x3, etc. Mutable versions have prefix V
, e.g. vpp::VMat4, vpp::VIMat3x2, etc.
Matrices are indexed similarly to vectors. A matrix is equivalent to a vector of columns (vector of vectors), therefore the first indexing operator applied to it selects a column, and the second one selects an element within column.
The value of the index can be any integer expression (variable or constant).
If a matrix is indexed with only one indexing operator, the result is a column vector. For example:
For mutable types like vpp::VMat2 or vpp::VIMat2 indexing operators can be used on the left side of assignments. Examples:
This section concerns uniform or storage buffers which hold only single structure. This can be e.g. parameters and matrices for entire rendering frame.
Buffers like these are accessed by means of vpp::UniformVar accessor object. Use it as in the example below.
The accessor provides indexing operator allowing to access fields defined within vpp::UniformStruct definition. This operator can be used to read fields, as well as to write in case the buffer is vpp::ioBuffer.
The vpp::UniformVar accessor is also used with vpp::inPushConstant binding points, to access push constants. This access is read-only. From this perspective, push constants do not differ from uniform buffers. However, they are written on the CPU side in a different manner and are faster.
If there is an entire array of structures within the buffer, use vpp::UniformArray accesor instead of vpp::UniformVar. The binding point types are the same. This accessor defines and index operator taking integer index (any integer expression is allowed, variable or constant) and the result can be indexed with a field name.
There can be two major kinds of simple data buffers: regular uniform buffers and texel buffers.
Regular uniform buffers ???
vpp::UniformSimpleArray vpp::TexelArray
Arrays of buffers (and images) are a different thing from buffers containing arrays. This time we have multiple buffers themselves. These buffer arrays are declared with the help of vpp::arrayOf template, like in the following code (shows examples for all supported arrays):
All of these arrays provide additional level of indexing when being accessed in the shader. Accepted indexes are any integer expressions, variable or constant.
The syntax differs slightly for buffers and images, because buffers use accessors and images do not. For buffers, you apply extra indexing to the accessor. Example:
For images, apply the indexing operator to the binding point name:
There is one thing for images that might be confusing. There are two different methods of arraying images. The one shown above involves an array of separately bound images. Actually this is an array of independent binding points. A different vpp::Image or vpp::Img object may be bound to each item in the array. Use vpp::multi template to make such selective bindings.
The other kind of image arraying is to declare an image itself as arrayed (multilayered) so that it contains multiple layers. In such case this is single image, bound to single point. You specify layer index as an extra coordinate passed to functions like vpp::Texture.
These two methods works completely independently from each other and in fact may be mixed.
Vertex and instance buffers are associated with structures defined by means of vpp::VertexStruct and vpp::InstanceStruct. Although they are physically arrays of structures, the vertex shader has access only to single, "current" element at any given moment. You apply the indexing operator directly to vpp::inVertexData binding point in the pipeline and provide a pointer to structure field, just like for single-element buffers (section Buffers containing single structure). An example:
VPP shader language offers a lot of constructs to control flow of the execution. All of these have different syntax from corresponding C++ constructs, as C++ keywords like if
or for
are not overloadable.
Nevertheless, VPP and C++ statements may be mixed in code, which gives interesting possibilities. Generally, C++ constructs behave as a metalanguage to VPP constructs.
In sections below short descriptions and examples are shown for these control constructs. Refer to individial docs pages for more information.
vpp::If() and vpp::Else() are counterparts of C++ if
and else
statements. Use as in the following example. Do not forget about vpp::Fi() at the end.
vpp::Select() implements the conditional question mark operator (?: ) from C++. This operator is not overloadable, hence the need of separate construct. The order of arguments is the same as in the conditional operator.
vpp::Do() and vpp::While() form basic looping construct. Always use them together and close the block with vpp::Od().
vpp::For() implements a simplified for
loop. It takes 3 or 4 arguments. The first one is the control variable which must be already declared mutable variable of type vpp::Int or vpp::UInt. The second argument is the starting value. The third one is ending limit value, the loop will be repeated as long as the control variable is less than the ending value. Optional fourth argument is the step that will be added to the control variable in each loop turn. By default it is 1.
As with other constructs, vpp::For() has corresponding block closing instruction called vpp::Rof().
vpp::Switch() construct is similar to the C++ one. Just as in C++ you need to use vpp::Break() to stop the control flow, otherwise fall-through behavior will occur.
vpp::StaticCast() and vpp::ReinterpretCast() are explicit type conversion operators.
vpp::StaticCast() converts data while preserving the value (or its approximation), just like in C++. You must use this operator somewhat more frequently in VPP than in C++, as VPP performs less implicit type conversions.
Example:
vpp::ReinterpretCast() converts data while preserving binary representation, similar to C++. It is allowed to be used for numeric types only (no pointers). One particular application is to manipulate bits in IEEE-754 floats and doubles, for some fast approximations.
Those constructs allow to create functions in GPU code. These functions may be called then with various arguments. Function definitions should occur outside other constructs, preferably at the beginning of the shader.
To define a function, use vpp::Function template. The first argument is the return type. More optional arguments specify function argument types. Runtime string argument is the visible name of the function in SPIR-V dumps (useful for debugging).
Next, comes the vpp::Par declarations, one for each function argument. They are needed to access arguments in the function code. You can use these names in expressions. Function arguments are read-only. Default or variable number arguments are not supported.
vpp::Function and vpp::Par are class templates.
Next, between vpp::Begin() and vpp::End(), the function body is located. Note that vpp::Begin() and vpp::End() do not create C++ scope, so it is recommended to create it yourself by introducing curly braces pair. This is optional however, and may be omitted if your function is simple and does not create local variables.
Curly braces may be also moved to higher level, enlosing vpp::Par declaration as well, so that they do not pollute the main shader scope. This is shown in the second (binomial
) function in the example below.
Calling functions is straightforward, syntax is the same as in C++.