Hello!
After playing with Vulkan, I had to assume that it is not as easy as I wanted to use. Since this thing done, I preferred to come back to OpenGL. However, Vulkan let sme learn a lot of things about how OpenGL works internally. I am going to make a series of tutorials about OpenGL AZDO. The first one will discuss bindless textures!
What is OpenGL AZDO ?
OpenGL Approaching Zero Driver Overhead is an idea which comes from Cass Everitt, Tim Foley, John McDonald, Graham Sellers. The idea buried in it is to reduce the using of CPU by using the last possibilities offered by the new GPUs.
AZDO presents many techniques to eschew to have a low overhead :
Make less binding as possible
Use persistent mapping
Use batching
Use GPU for everything (culling, fill structures).
This series of tutorials will treat about how to implements such things.
Bindless Texture
Bindless texture solved a problem you may notice to implement batching. A naive draw loop could be like that
The main issue here is we cannot perform an efficient batch since each drawcall could have different textures.
Now, imagine you could put a texture inside a uniform buffer and just perform one big draw call! You reach to a very very few overhead!
How to do it ?
We are lucky, according to me, bindless texture is the easier of the AZDO feature to implement. However, we will really see them in action in the chapter about the batching. To run into bindless texture, you just have to follow these following steps
Create the texture in the normal way
Get the handle (kind of the address of the texture)
Make the handle resident
Put the handle in an uniform buffer
So there is a function you can use to load an image file using SDL and put it into a texture and enable bindless feature:
This code is easy, first you load a surface with SDL_image, you create the texture, you compute the number of possible mipmapping, you allocate them (each mipmapping’s level) and you send the value to the first mipmapping’s level.
After, you generate mipmaps, and you ask the texture to get the handle back, and you make it resident.
To use this “bindless” texture, you just have to put the “handle” (GLuint64) inside one uniform buffer.
After, you can use it like that:
#version 450 core
#extension GL_ARB_bindless_texture : require
layout(std140, binding = 0) uniform frameBuffer {
// Here are all frameBuffer's renderTarget
sampler2D gBufferNormal;
sampler2D gBufferDiffuse;
sampler2D gBufferDepth;
};
layout(location = 0) in vec2 uv;
layout(location = 0) out vec4 outColor;
void main(void)
{
outColor = texture(gBufferDiffuse, uv);
}
The next article could be about batching (with multi draw indirect) or persistent mapping.
Hi!
Once again, I am going to present you some vulkan features, like pipelines, barriers, memory management, and all things useful for prior ones. This article will be long, but it will be separating into several chapters.
Memory Management
In Vulkan application, it is up to the developer to manage himself the memory. The number of allocations is limited. Make one allocation for one buffer, or one image is really a bad design in Vulkan. One good design is to make a big allocation (let’s call that a chunk), and manage it yourself, and allocate buffer or image within the chunk.
A Chunk Allocator
We need a simple object which has responsibility for allocations of chunks. It just has to select the good heap and call allocate and free from Vulkan API.
#include "chunkallocator.hpp"
#include "System/exception.hpp"
ChunkAllocator::ChunkAllocator(Device &device) : mDevice(device)
{
}
std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char*>
ChunkAllocator::allocate(VkMemoryPropertyFlags flags, VkDeviceSize size) {
VkPhysicalDeviceMemoryProperties const &property = mDevice.memoryProperties();
int index = -1;
// Looking for a heap with good flags and good size
for(auto i(0u); i < property.memoryTypeCount; ++i)
if((property.memoryTypes[i].propertyFlags & flags) == flags)
if(size < property.memoryHeaps[property.memoryTypes[i].heapIndex].size)
index = i;
if(index == -1)
throw std::runtime_error("No good heap found");
VkMemoryAllocateInfo info = {};
info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
info.pNext = nullptr;
info.allocationSize = size;
info.memoryTypeIndex = index;
// Perform the allocation
VkDeviceMemory mem;
vulkanCheckError(vkAllocateMemory(mDevice, &info, nullptr, &mem));
mDeviceMemories.push_back(mem);
char *ptr;
// We map the memory if it is host visible
if(flags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)
vulkanCheckError(vkMapMemory(mDevice, mem, 0, VK_WHOLE_SIZE, 0, (void**)&ptr));
return std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char*>
(mem, flags, size, ptr);
}
ChunkAllocator::~ChunkAllocator() {
// We free all memory objects
for(auto &mem : mDeviceMemories)
vkFreeMemory(mDevice, mem, nullptr);
}
This piece of code is quite simple and easy to read.
Memory Pool
Memory pools are structures used to optimize dynamic allocation performances. In video games, it is not an option to use a memory pool. Ideas are the same I told in the first part. Allocate a chunk, and sub allocate yourself within the chunk. I made a simple generic memory pool.
There is a little scheme which explains what I wanted to do.
As you can see, video memory is separated into several parts (4 here) and each “Block” in the linked list describes one sub-allocation.
One block is described by :
Size of the block
Offset of the block relatively with the DeviceMemory
A pointer to set data from the host (map)
Boolean to know about the freeness of the block
A sub-allocation within a chunk is performed as follows :
Traverse the linked list until we find a well-sized free block
Modify the size and set the boolean to false
Create a new block, set size, offset and put boolean to true and insert it after the current one.
A free is quite simple, you just have to put the boolean to true.
A good other method could be a “shrink to fit”. If there are some following others with the boolean set to true, we merge all blocks into one.
#include "memorypool.hpp"
#include <cassert>
MemoryPool::MemoryPool(Device &device) :
mDevice(device), mChunkAllocator(device) {}
Allocation MemoryPool::allocate(VkDeviceSize size, VkMemoryPropertyFlags flags) {
if(size % 128 != 0)
size = size + (128 - (size % 128)); // 128 bytes alignment
assert(size % 128 == 0);
for(auto &chunk: mChunks) {
// if flags are okay
if((chunk.flags & flags) == flags) {
int indexBlock = -1;
// We are looking for a good block
for(auto i(0u); i < chunk.blocks.size(); ++i) {
if(chunk.blocks[i].isFree) {
if(chunk.blocks[i].size > size) {
indexBlock = i;
break;
}
}
}
// If a block is find
if(indexBlock != -1) {
Block newBlock;
// Set the new block
newBlock.isFree = true;
newBlock.offset = chunk.blocks[indexBlock].offset + size;
newBlock.size = chunk.blocks[indexBlock].size - size;
newBlock.ptr = chunk.blocks[indexBlock].ptr + size;
// Modify the current block
chunk.blocks[indexBlock].isFree = false;
chunk.blocks[indexBlock].size = size;
// If allocation does not fit perfectly the block
if(newBlock.size != 0)
chunk.blocks.emplace(chunk.blocks.begin() + indexBlock + 1, newBlock);
return Allocation(chunk.memory, chunk.blocks[indexBlock].offset, size, chunk.blocks[indexBlock].ptr);
}
}
}
// if we reach there, we have to allocate a new chunk
addChunk(mChunkAllocator.allocate(flags, 1 << 25));
return allocate(size, flags);
}
void MemoryPool::free(Allocation const &alloc) {
for(auto &chunk: mChunks)
if(chunk.memory == std::get<0>(alloc)) // Search the good memory device
for(auto &block : chunk.blocks)
if(block.offset == std::get<1>(alloc)) // Search the good offset
block.isFree = true; // put it to free
}
void MemoryPool::addChunk(const std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char *> &ptr) {
Chunk chunk;
Block block;
// Add a block mapped along the whole chunk
block.isFree = true;
block.offset = 0;
block.size = std::get<2>(ptr);
block.ptr = std::get<3>(ptr);
chunk.flags = std::get<1>(ptr);
chunk.memory = std::get<0>(ptr);
chunk.size = std::get<2>(ptr);
chunk.ptr = std::get<3>(ptr);
chunk.blocks.emplace_back(block);
mChunks.emplace_back(chunk);
}
Buffers
Buffers are a well-known part in OpenGL. In Vulkan, it is approximately the same, but you have to manage yourself the memory through one memory pool.
When you create one buffer, you have to give him a size, an usage (uniform buffer, index buffer, vertex buffer, …). You also could ask for a sparse buffer (Sparse resources will be a subject of an article one day ^_^). You also could tell him to be in a mode concurrent. Thanks to that, you could access the same buffer through two different queues.
I chose to have a host visible and host coherent memory. But it is not especially useful. Indeed, to achieve a better performance, you could want to use a non coherent memory (but you will have to flush/invalidate your memory!!).
For the host visible memory, it is not especially useful as well, indeed, for indirect rendering, it could be smart to perform culling with the GPU to fill all structures!
Shaders
Shaders are Different parts of your pipelines. It is an approximation obviously. But, for each part (vertex processing, geometry processing, fragment processing…), shader associated is invoked. In Vulkan, shaders are wrote with SPIR-V.
SPIR-V is “.class” are for Java. You may compile your GLSL sources to SPIR-V using glslangvalidator.
Why is SPIR-V so powerful ?
SPIR-V allows developers to provide their application without the shader’s source.
SPIR-V is an intermediate representation. Thanks to that, vendor implementation does not have to write a specific language compiler. It results in a lower complexity for the driver and it could more optimize, and compile it faster.
Shaders in Vulkan
Contrary to OpenGL’s shader, it is really easy to compile in Vulkan.
My implementation keeps in memory all shaders into a hashtable. It lets to prevent any shader’s recompilation.
Pipelines are objects used for dispatch (compute pipelines) or render something (graphic pipelines).
The beginning of this part is going to be a summarize of the Vulkan’s specs.
Descriptors
Shaders access buffer and image resources through special variables. These variables are organized into a set of bindings. One set is described by one descriptor.
Descriptor Set Layout
They describe one set. One set is compound with an array of bindings. Each bindings are described by :
A binding number
One type : Image, uniform buffer, SSBO, …
The number of values (Could be an array of textures)
Stage where shader could access the binding.
Allocation of Descriptor Sets
They are allocated from descriptor pool objects.
One descriptor pool object is described by a number of set allocation possible, and an array of descriptor type / count it can allocate.
Once you have the descriptor pool, you could allocate from it sets (using both descriptor pool and descriptor set layout).
When you destroy the pool, sets also are destroyed.
Give buffer / image to sets
Now, we have descriptors, but we have to tell Vulkan where shaders can get data from.
Pipeline Layouts
Pipeline layouts are a kind of bridge between the pipeline and descriptor sets. They let you manage push constant as well (we’ll see them in a future article).
Implementation
Since descriptor sets are not coupled with pipelines layout. We could separate pipeline layout and descriptor pool / sets, but currently, I prefer to keep them coupled. It is a choice, and it will maybe change in the future.
I am going to explain quickly what memory barriers are.
The idea behind the memory barrier is ensured writes are performed.
When you performed one compute or one render, it is your duty to ensure that data will be visible when you want to re-use them.
In our main.cpp example, I draw a triangle into a frame buffer and present it.
Image barriers are compound with access, layout, and pipeline barrier with stage.
Since the presentation is a read of a framebuffer, srcAccessMask is VK_ACCESS_MEMORY_READ_BIT.
Now, we want to render inside this image via a framebuffer, so dstAccessMask is VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT.
We were presented the image, and now we want to render inside it, so, layouts are obvious.
When we submit image memory barrier to the command buffer, we have to tell it which stages are affected. Here, we wait for all commands and we begin for the first stage of the pipeline.
The only difference is the order and stageMasks. Here we wait for the color attachement (and not the Fragment one !!!!) and we begin with the end of the stages (It is not really easy to explain… but it does not sound not logic).
Steps to render something using pipelines are:
Create pipelines
Create command pools, command buffer and begin them
Create vertex / index buffers
Bind pipelines to their subpass, bind buffers and descriptor sets
Hi there !
A Few weeks ago, February 16th to be precise, Vulkan, the new graphic API from Khronos was released. It is a new API which gives much more control about the GPUs than OpenGL (API I loved before Vulkan ^_^).
OpenGL’s problems
Driver Overhead
Fast rendering problems could be from the driver, video games don’t use perfectly the GPU (maybe 80% instead of 95-100% of use). Driver overheads have big costs and more recent OpenGL version tend to solve this problem with Bindless Textures, multi draws, direct state access, etc.
Keep in mind that each GPU calls could have a big cost.
Cass Everitt, Tim Foley, John McDonald, Graham Sellers presented Approaching Zero Driver Overhead with OpenGL in 2014.
Multi threading
With OpenGL, it is not possible to have an efficient multi threading, because an OpenGL context is for one and only one thread that is why it is not so easy to make a draw call from another thread ^_^.
Vulkan
Vulkan is not really a low level API, but it provides a far better abstraction for moderns hardwares. Vulkan is more than AZDO, it is, as Graham Sellers said, PDCTZO (Pretty Darn Close To Zero Overhead).
Series of articles about Lava
What is Lava ?
Lava is the name I gave to my new graphic (physics?) engine. It will let me learn how Vulkan work, play with it, implement some global illumination algorithms, and probably share with you my learnings and feelings about Vulkan. It is possible that I’ll make some mistakes, so, If I do, please let me know !
Why Lava ?
Vulkan makes me think about Volcano that make me think about Lava, so… I chose it 😀 .
Initialization
Now begins what I wanted to discuss, initialization of Vulkan.
First of all, you have to really know and understand what you will attend to do. For the beginning, we are going to see how to have a simple pink window.
When you are developing with Vulkan, I advise you to have specifications from Khronos on another window (or screen if you are using multiple screens).
To have an easier way to manage windows, I am using GLFW 3.2, and yes, you are mandatory to compile it yourself ^_^, but it is not difficult at all, so it is not a big deal.
Instance
Contrary to OpenGL, in Vulkan, there is no global state, an instance could be similar to an OpenGL Context. An instance doesn’t know anything about other instances, is utterly isolate. The creation of an instance is really easy.
From this Instance, you could retrieve all GPUs on your computer.
You could create a connection between your application and the GPU you want using a VkDevice.
Creating this connection, you have to create as well queues.
Queues are used to perform tasks, you submit the task to a queue and it will be performed.
The queues are separated between several families.
A good way could be use several queues, for example, one for the physics and one for the graphics (or even 2 or three for this last).
You could as well give a priority (between 0 and 1) to a queue. Thanks to that, if you consider a task not so important, you just have to give to the used queue a low priority :).
The images represent a mono or multi dimensional array (1D, 2D or 3D).
The images don’t give any get or set for data. If you want to use them in your application, then you must use ImageViews.
ImageViews are directly relied to an image. The creation of an ImageView is not really complicated.
A window is assigned to a Surface (VkSurfaceKHR). To draw something, you have to render into this surface via swapchains.
From notions of Swapchains
In Vulkan, you have to manage the double buffering by yourself via Swapchain. When you create a swapchain, you link it to a Surface and tell it how many images you need. For a double buffering, you need 2 images.
Once the swapchain was created, you should retrieve images and create frame buffers using them.
The steps to have a correct swapchain is :
Create a Window
Create a Surface assigned to this Window
Create a Swapchain with several images assigned to this Surface
void SurfaceWindow::begin() {
// No checking because could be in lost state if change res
vkAcquireNextImageKHR(mDevice, mSwapchain, UINT64_MAX, VK_NULL_HANDLE, VK_NULL_HANDLE, &mCurrentSwapImage);
}
void SurfaceWindow::end(Queue &queue) {
VkPresentInfoKHR info;
info.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
info.pNext = nullptr;
info.waitSemaphoreCount = 0;
info.pWaitSemaphores = nullptr;
info.swapchainCount = 1;
info.pSwapchains = &mSwapchain;
info.pImageIndices = &mCurrentSwapImage;
info.pResults = nullptr;
vkQueuePresentKHR(queue, &info);
}
To notions of Render Pass
Right now, Vulkan should be initialized. To render something, we have to use render pass, and command buffer.
Command Buffers
Command buffer is quite similar to vertex array object (VAO) or display list (old old old OpenGL 😀 ).
You begin the recorded state, you record some “information” and you end the recorded state.
Command buffers are allocated from the CommandPool.
Vulkan provides two types of Command Buffer.
Primary level : They should be submitted within a queue.
Secondary level : They should be executed by a primary level command buffer.
One render pass is executed on one framebuffer. The creation is not easy at all. One render pass is componed with one or several subpasses.
I remind that framebuffers could have several attachments.
Each attachment are not mandatory to be used for all subpasses.
This piece of code to create one renderpass is not definitive at all and will be changed as soon as possible ^^. But for our example, it is correct.