Category: Vulkan

  • Vulkan Memory Management : How to write your own allocator

    Hi ! This article will deal with the memory management in Vulkan. But first, I am going to tell you what happened in my life.

    State of my life

    Again, it has been more than one month I did not write anything. So, where am I? I am in the last year of Télécom SudParis. I am following High Tech Imaging courses. It is the image specialization in my school. The funny part of it is : in parallel, I am a lecturer in a video games specialization. I taught OpenGL (3.3 because I cannot make an OpenGL 4 courses (everyone does not have a good hardware for that)). I got an internship in Dassault Systemes (France). It will begin the first February. I will work on the soft shadow engine (OpenGL 4.5).

    Vulkan

    To begin, some articles that I wrote before this one can contain mistakes, or some things are not well explained, or not very optimized.

    Why came back to Vulkan?

    I came back to Vulkan because I wanted to make one of the first “amateur” renderer using Vulkan. Also, I wanted to have a better improvement of memory management, memory barrier and other joys like that. Moreover, I made a repository with “a lot” of Vulkan Example : Vulkan example repository.
    I did not mean to replace the Sascha Willems ones. But I propose my way to do it, in C++, using Vulkan HPP.

    Memory Management with Vulkan

    Different kind of memory

    Heap

    One graphic card can read memory from different heap. It can read memory from its own heap, or the system heap (RAM).

    Type

    It exists a different kind of memory type. For example, it exists memories that are host cached, or host coherent, or device local and other.

    Host and device
    Host

    This memory resides in the RAM. This heap should have generally one (or several) type that own the bit “HOST_VISIBLE”. It means to Vulkan that it could be mapped persistently. Going that way, you get the pointer and you can write from the CPU on it.

    Device Local

    This memory resides on the graphic card. It is freaking fast and is not generally host_visible. That means you have to use a staging resource to write something to it or use the GPU itself.

    Allocation in Vulkan

    In Vulkan, the number of allocation per heap is driver limited. That means you can not do a lot of allocation and you must not use one allocation by buffer or image but one allocation for several buffers and images.
    In this article, I will not take care about the CPU cache or anything like that, I will only focus my explanations on how to have the better from the GPU-side.
    Memory Managements : good and bad

    How will we do it?

    Memory Managements : device allocator
    As you can see, we have a block, that could represent the memory for one buffer, or for one image, we have a chunk that represents one allocation (via vkAllocateMemory) and we have a DeviceAllocator that manages all chunks.

    Block

    I defined a block as follow :

    struct Block {
        vk::DeviceMemory memory;
        vk::DeviceSize offset;
        vk::DeviceSize size;
        bool free;
        void *ptr = nullptr; // Useless if it is a GPU allocation
    
        bool operator==(Block const &block);
    };
    bool Block::operator==(Block const &block) {
        if(memory == block.memory &&
           offset == block.offset &&
           size == block.size &&
           free == block.free &&
           ptr == block.ptr)
            return true;
        return false;
    }

    A block, as it is named, defines a little region within one allocation.
    So, it has an offset, one size, and a boolean to know if it is used or not.
    It may own a ptr if it is an

    Chunk

    A chunk is a memory region that contains a list of blocks. It represents a single allocation.
    What a chunk could let us to do?

    1. Allocate a block
    2. Deallocate a block
    3. Tell us if the block is inside the chunk

    That gives us:

    #pragma once
    #include "block.hpp"
    
    class Chunk : private NotCopyable {
    public:
        Chunk(Device &device, vk::DeviceSize size, int memoryTypeIndex);
    
        bool allocate(vk::DeviceSize size, Block &block);
        bool isIn(Block const &block) const;
        void deallocate(Block const &block);
        int memoryTypeIndex() const;
    
        ~Chunk();
    
    protected:
        Device mDevice;
        vk::DeviceMemory mMemory = VK_NULL_HANDLE;
        vk::DeviceSize mSize;
        int mMemoryTypeIndex;
        std::vector<Block> mBlocks;
        void *mPtr = nullptr;
    };

    One chunk allocates its memory inside the constructor.

    Chunk::Chunk(Device &device, vk::DeviceSize size, int memoryTypeIndex) :
        mDevice(device),
        mSize(size),
        mMemoryTypeIndex(memoryTypeIndex) {
        vk::MemoryAllocateInfo allocateInfo(size, memoryTypeIndex);
    
        Block block;
        block.free = true;
        block.offset = 0;
        block.size = size;
        mMemory = block.memory = device.allocateMemory(allocateInfo);
    
        if((device.getPhysicalDevice().getMemoryProperties().memoryTypes[memoryTypeIndex].propertyFlags & vk::MemoryPropertyFlagBits::eHostVisible) == vk::MemoryPropertyFlagBits::eHostVisible)
            mPtr = device.mapMemory(mMemory, 0, VK_WHOLE_SIZE);
    
        mBlocks.emplace_back(block);
    }

    Since a deallocation is really easy (only to put the block to free), one allocation requires a bit of attention. You need to check if the block is free, and if it is free, you need to check for its size, and, if necessary, create another block if the size of the allocation is less than the available size. You also need take care about memory alignment !

    void Chunk::deallocate(const Block &block) {
        auto blockIt(std::find(mBlocks.begin(), mBlocks.end(), block));
        assert(blockIt != mBlocks.end());
        // Just put the block to free
        blockIt->free = true;
    }
    
    bool Chunk::allocate(vk::DeviceSize size, vk::DeviceSize alignment, Block &block) {
        // if chunk is too small
        if(size > mSize)
            return false;
    
        for(uint32_t i = 0; i < mBlocks.size(); ++i) {
            if(mBlocks[i].free) {
                // Compute virtual size after taking care about offsetAlignment
                uint32_t newSize = mBlocks[i].size;
    
                if(mBlocks[i].offset % alignment != 0)
                    newSize -= alignment - mBlocks[i].offset % alignment;
    
                // If match
                if(newSize >= size) {
    
                    // We compute offset and size that care about alignment (for this Block)
                    mBlocks[i].size = newSize;
                    if(mBlocks[i].offset % alignment != 0)
                        mBlocks[i].offset += alignment - mBlocks[i].offset % alignment;
    
                    // Compute the ptr address
                    if(mPtr != nullptr)
                        mBlocks[i].ptr = (char*)mPtr + mBlocks[i].offset;
    
                    // if perfect match
                    if(mBlocks[i].size == size) {
                        mBlocks[i].free = false;
                        block = mBlocks[i];
                        return true;
                    }
    
                    Block nextBlock;
                    nextBlock.free = true;
                    nextBlock.offset = mBlocks[i].offset + size;
                    nextBlock.memory = mMemory;
                    nextBlock.size = mBlocks[i].size - size;
                    mBlocks.emplace_back(nextBlock); // We add the newBlock
    
                    mBlocks[i].size = size;
                    mBlocks[i].free = false;
    
                    block = mBlocks[i];
                    return true;
                }
            }
        }
    
        return false;
    }

    Chunk Allocator

    Maybe it is bad-named, but the chunk allocator let us to separate the creation of one chunk from the chunk itself. We give it one size and it operates all the verifications we need.

    class ChunkAllocator : private NotCopyable
    {
    public:
        ChunkAllocator(Device &device, vk::DeviceSize size);
    
        // if size > mSize, allocate to the next power of 2
        std::unique_ptr<Chunk> allocate(vk::DeviceSize size, int memoryTypeIndex);
    
    private:
        Device mDevice;
        vk::DeviceSize mSize;
    };
    
    vk::DeviceSize nextPowerOfTwo(vk::DeviceSize size) {
        vk::DeviceSize power = (vk::DeviceSize)std::log2l(size) + 1;
        return (vk::DeviceSize)1 << power;
    }
    
    bool isPowerOfTwo(vk::DeviceSize size) {
        vk::DeviceSize mask = 0;
        vk::DeviceSize power = (vk::DeviceSize)std::log2l(size);
    
        for(vk::DeviceSize i = 0; i < power; ++i)
            mask += (vk::DeviceSize)1 << i;
    
        return !(size & mask);
    }
    
    ChunkAllocator::ChunkAllocator(Device &device, vk::DeviceSize size) :
        mDevice(device),
        mSize(size) {
        assert(isPowerOfTwo(size));
    }
    
    std::unique_ptr<Chunk> ChunkAllocator::allocate(vk::DeviceSize size,
                                                    int memoryTypeIndex) {
        size = (size > mSize) ? nextPowerOfTwo(size) : mSize;
    
        return std::make_unique<Chunk>(mDevice, size, memoryTypeIndex);
    }

    Device Allocator

    I began to make an abstract class for Vulkan allocation :

    /**
     * @brief The AbstractAllocator Let the user to allocate or deallocate some blocks
     */
    class AbstractAllocator : private NotCopyable
    {
    public:
        AbstractAllocator(Device const &device) :
            mDevice(std::make_shared<Device>(device)) {
    
        }
    
        virtual Block allocate(vk::DeviceSize size, vk::DeviceSize alignment, int memoryTypeIndex) = 0;
        virtual void deallocate(Block &block) = 0;
    
        Device getDevice() const {
            return *mDevice;
        }
    
        virtual ~AbstractAllocator() = 0;
    
    protected:
        std::shared_ptr<Device> mDevice;
    };
    
    inline AbstractAllocator::~AbstractAllocator() {
    
    }
    

    As you noticed, it is really easy. You can allocate or deallocate from this allocator. Next, I created a DeviceAllocator that inherits from AbstractAllocator.

    class DeviceAllocator : public AbstractAllocator
    {
    public:
        DeviceAllocator(Device device, vk::DeviceSize size);
    
        Block allocate(vk::DeviceSize size, vk::DeviceSize alignment, int memoryTypeIndex);
        void deallocate(Block &block);
    
    
    private:
        ChunkAllocator mChunkAllocator;
        std::vector<std::shared_ptr<Chunk>> mChunks;
    };
    

    This allocator contains a list of chunks, and contains one ChunkAllocator to allocate chunks.
    The allocation is really easy. We have to check if it exists a “good chunk” and if we can allocate from it. Otherwise, we create another chunk and it is over !

    DeviceAllocator::DeviceAllocator(Device device, vk::DeviceSize size) :
        AbstractAllocator(device),
        mChunkAllocator(device, size) {
    
    }
    
    Block DeviceAllocator::allocate(vk::DeviceSize size, vk::DeviceSize alignment, int memoryTypeIndex) {
        Block block;
        // We search a "good" chunk
        for(auto &chunk : mChunks)
            if(chunk->memoryTypeIndex() == memoryTypeIndex)
                if(chunk->allocate(size, alignment, block))
                    return block;
    
        mChunks.emplace_back(mChunkAllocator.allocate(size, memoryTypeIndex));
        assert(mChunks.back()->allocate(size, alignment, block));
        return block;
    }
    
    void DeviceAllocator::deallocate(Block &block) {
        for(auto &chunk : mChunks) {
            if(chunk->isIn(block)) {
                chunk->deallocate(block);
                return ;
            }
        }
        assert(!"unable to deallocate the block");
    }
    

    Conclusion

    Since I came back to Vulkan, I really had a better understanding of this new API. I can write article in better quality than in march.
    I hope you enjoyed this remake of memory management.
    My next article will be about buffer, and staging resource. It will be a little article. I will write as well an article that explains how to load textures and their mipmaps.

    References

    Vulkan Memory Management

    Kisses and see you soon (probably this week !)

  • Vulkan Pipelines, Barrier, memory management

    Hi!
    Once again, I am going to present you some vulkan features, like pipelines, barriers, memory management, and all things useful for prior ones. This article will be long, but it will be separating into several chapters.

    Memory Management

    In Vulkan application, it is up to the developer to manage himself the memory. The number of allocations is limited. Make one allocation for one buffer, or one image is really a bad design in Vulkan. One good design is to make a big allocation (let’s call that a chunk), and manage it yourself, and allocate buffer or image within the chunk.

    A Chunk Allocator

    We need a simple object which has responsibility for allocations of chunks. It just has to select the good heap and call allocate and free from Vulkan API.

    #pragma once
    
    #include "System/Vulkan/Hardware/device.hpp"
    #include <tuple>
    
    class ChunkAllocator
    {
    public:
        ChunkAllocator(Device &device);
    
        // Memory, flags, size, ptr
        std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char *>
        allocate(VkMemoryPropertyFlags flags, VkDeviceSize size);
    
        ~ChunkAllocator();
    
    private:
        Device &mDevice;
    
        std::vector<VkDeviceMemory> mDeviceMemories; //!< Each chunk
    };
    #include "chunkallocator.hpp"
    #include "System/exception.hpp"
    
    ChunkAllocator::ChunkAllocator(Device &device) : mDevice(device)
    {
    
    }
    
    std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char*>
    ChunkAllocator::allocate(VkMemoryPropertyFlags flags, VkDeviceSize size) {
        VkPhysicalDeviceMemoryProperties const &property = mDevice.memoryProperties();
        int index = -1;
    
        // Looking for a heap with good flags and good size
        for(auto i(0u); i < property.memoryTypeCount; ++i)
            if((property.memoryTypes[i].propertyFlags & flags) == flags)
                if(size < property.memoryHeaps[property.memoryTypes[i].heapIndex].size)
                    index = i;
    
        if(index == -1)
            throw std::runtime_error("No good heap found");
    
        VkMemoryAllocateInfo info = {};
        info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
        info.pNext = nullptr;
        info.allocationSize = size;
        info.memoryTypeIndex = index;
    
        // Perform the allocation
        VkDeviceMemory mem;
        vulkanCheckError(vkAllocateMemory(mDevice, &info, nullptr, &mem));
        mDeviceMemories.push_back(mem);
    
        char *ptr;
         // We map the memory if it is host visible
        if(flags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)
            vulkanCheckError(vkMapMemory(mDevice, mem, 0, VK_WHOLE_SIZE, 0, (void**)&ptr));
    
    
        return std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char*>
                (mem, flags, size, ptr);
    }
    
    ChunkAllocator::~ChunkAllocator() {
        // We free all memory objects
        for(auto &mem : mDeviceMemories)
            vkFreeMemory(mDevice, mem, nullptr);
    }
    

    This piece of code is quite simple and easy to read.

    Memory Pool

    Memory pools are structures used to optimize dynamic allocation performances. In video games, it is not an option to use a memory pool. Ideas are the same I told in the first part. Allocate a chunk, and sub allocate yourself within the chunk. I made a simple generic memory pool.
    There is a little scheme which explains what I wanted to do.

    Memory Pool
    Memory Pool

    As you can see, video memory is separated into several parts (4 here) and each “Block” in the linked list describes one sub-allocation.
    One block is described by :

    1. Size of the block
    2. Offset of the block relatively with the DeviceMemory
    3. A pointer to set data from the host (map)
    4. Boolean to know about the freeness of the block

    A sub-allocation within a chunk is performed as follows :

    1. Traverse the linked list until we find a well-sized free block
    2. Modify the size and set the boolean to false
    3. Create a new block, set size, offset and put boolean to true and insert it after the current one.

    A free is quite simple, you just have to put the boolean to true.
    A good other method could be a “shrink to fit”. If there are some following others with the boolean set to true, we merge all blocks into one.

    #pragma once
    
    #include "chunkallocator.hpp"
    
    // Memory, Offset, Size, ptr
    using Allocation = std::tuple<VkDeviceMemory, VkDeviceSize, VkDeviceSize, char*>;
    
    class MemoryPool {
        // Describes one user allocation
        struct Block {
            VkDeviceSize offset;
            VkDeviceSize size;
            char *ptr;
            bool isFree;
        };
    
        struct Chunk {
            VkDeviceMemory memory;
            VkMemoryPropertyFlags flags;
            VkDeviceSize size;
            char *ptr;
            std::vector<Block> blocks;
        };
    
    public:
        MemoryPool(Device &device);
    
        Allocation allocate(VkDeviceSize size, VkMemoryPropertyFlags flags);
    
        void free(Allocation const &alloc);
    
    private:
        Device &mDevice;
        ChunkAllocator mChunkAllocator;
        std::vector<Chunk> mChunks;
    
        void addChunk(std::tuple<VkDeviceMemory, VkMemoryPropertyFlags,
                      VkDeviceSize, char*> const &ptr);
    };
    #include "memorypool.hpp"
    #include <cassert>
    
    MemoryPool::MemoryPool(Device &device) :
        mDevice(device), mChunkAllocator(device) {}
    
    Allocation MemoryPool::allocate(VkDeviceSize size, VkMemoryPropertyFlags flags) {
        if(size % 128 != 0)
            size = size + (128 - (size % 128)); // 128 bytes alignment
        assert(size % 128 == 0);
    
        for(auto &chunk: mChunks) {
            // if flags are okay
            if((chunk.flags & flags) == flags) {
                int indexBlock = -1;
                // We are looking for a good block
                for(auto i(0u); i < chunk.blocks.size(); ++i) {
                    if(chunk.blocks[i].isFree) {
                        if(chunk.blocks[i].size > size) {
                            indexBlock = i;
                            break;
                        }
                    }
                }
    
                // If a block is find
                if(indexBlock != -1) {
                    Block newBlock;
                    // Set the new block
                    newBlock.isFree = true;
                    newBlock.offset = chunk.blocks[indexBlock].offset + size;
                    newBlock.size = chunk.blocks[indexBlock].size - size;
                    newBlock.ptr = chunk.blocks[indexBlock].ptr + size;
    
                    // Modify the current block
                    chunk.blocks[indexBlock].isFree = false;
                    chunk.blocks[indexBlock].size = size;
    
                    // If allocation does not fit perfectly the block
                    if(newBlock.size != 0)
                        chunk.blocks.emplace(chunk.blocks.begin() + indexBlock + 1, newBlock);
    
                    return Allocation(chunk.memory, chunk.blocks[indexBlock].offset, size, chunk.blocks[indexBlock].ptr);
                }
            }
        }
    
        // if we reach there, we have to allocate a new chunk
        addChunk(mChunkAllocator.allocate(flags, 1 << 25));
    
        return allocate(size, flags);
    }
    
    void MemoryPool::free(Allocation const &alloc) {
        for(auto &chunk: mChunks)
            if(chunk.memory == std::get<0>(alloc)) // Search the good memory device
                for(auto &block : chunk.blocks)
                    if(block.offset == std::get<1>(alloc)) // Search the good offset
                        block.isFree = true; // put it to free
    }
    
    void MemoryPool::addChunk(const std::tuple<VkDeviceMemory, VkMemoryPropertyFlags, VkDeviceSize, char *> &ptr) {
        Chunk chunk;
        Block block;
    
        // Add a block mapped along the whole chunk
        block.isFree = true;
        block.offset = 0;
        block.size = std::get<2>(ptr);
        block.ptr = std::get<3>(ptr);
    
        chunk.flags = std::get<1>(ptr);
        chunk.memory = std::get<0>(ptr);
        chunk.size = std::get<2>(ptr);
        chunk.ptr = std::get<3>(ptr);
        chunk.blocks.emplace_back(block);
        mChunks.emplace_back(chunk);
    }

    Buffers

    Buffers are a well-known part in OpenGL. In Vulkan, it is approximately the same, but you have to manage yourself the memory through one memory pool.

    When you create one buffer, you have to give him a size, an usage (uniform buffer, index buffer, vertex buffer, …). You also could ask for a sparse buffer (Sparse resources will be a subject of an article one day ^_^). You also could tell him to be in a mode concurrent. Thanks to that, you could access the same buffer through two different queues.

    #pragma once
    
    #include "memorypool.hpp"
    
    class Buffer
    {
    public:
        Buffer(Device &device, MemoryPool &memoryPool,
               VkBufferUsageFlags usage, VkDeviceSize size,
               VkSharingMode sharing = VK_SHARING_MODE_EXCLUSIVE,
               uint32_t nFamilyIndex = 0, uint32_t *pQueueFamilyIndices = nullptr);
    
        Buffer(Buffer &&buf);
    
        template<typename T>
        T *map() {
            return (T*)std::get<3>(mAllocation);
        }
    
        VkDeviceSize size();
    
        operator VkBuffer();
    
        ~Buffer();
    
    private:
        Device &mDevice;
        MemoryPool &mMemoryPool;
        Allocation mAllocation;
        VkBuffer mBuffer;
    };
    Buffer::Buffer(Device &device, MemoryPool &memoryPool,
                   VkBufferUsageFlags usage, VkDeviceSize size, VkSharingMode sharing,
                   uint32_t nFamilyIndex, uint32_t *pQueueFamilyIndices) :
        mDevice(device), mMemoryPool(memoryPool) {
        VkBufferCreateInfo info = {};
    
        info.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.size = size;
        info.usage = usage;
        info.sharingMode = sharing;
        info.queueFamilyIndexCount = nFamilyIndex;
        info.pQueueFamilyIndices = pQueueFamilyIndices;
    
        vulkanCheckError(vkCreateBuffer(mDevice, &info, nullptr, &mBuffer));
    
        mAllocation = memoryPool.allocate(size, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
        vulkanCheckError(vkBindBufferMemory(mDevice, mBuffer, std::get<0>(mAllocation), std::get<1>(mAllocation)));
    }
    
    Buffer::~Buffer() {
        if(mBuffer != VK_NULL_HANDLE)
            mMemoryPool.free(mAllocation);
        vkDestroyBuffer(mDevice, mBuffer, nullptr);
    }

    I chose to have a host visible and host coherent memory. But it is not especially useful. Indeed, to achieve a better performance, you could want to use a non coherent memory (but you will have to flush/invalidate your memory!!).
    For the host visible memory, it is not especially useful as well, indeed, for indirect rendering, it could be smart to perform culling with the GPU to fill all structures!

    Shaders

    Shaders are Different parts of your pipelines. It is an approximation obviously. But, for each part (vertex processing, geometry processing, fragment processing…), shader associated is invoked. In Vulkan, shaders are wrote with SPIR-V.
    SPIR-V is “.class” are for Java. You may compile your GLSL sources to SPIR-V using glslangvalidator.

    Why is SPIR-V so powerful ?

    SPIR-V allows developers to provide their application without the shader’s source.
    SPIR-V is an intermediate representation. Thanks to that, vendor implementation does not have to write a specific language compiler. It results in a lower complexity for the driver and it could more optimize, and compile it faster.

    Shaders in Vulkan

    Contrary to OpenGL’s shader, it is really easy to compile in Vulkan.
    My implementation keeps in memory all shaders into a hashtable. It lets to prevent any shader’s recompilation.

    #pragma once
    
    #include "System/Vulkan/Hardware/device.hpp"
    #include <unordered_map>
    #include <string>
    
    class Shaders
    {
    public:
        Shaders(Device &device);
    
        VkShaderModule get(std::string const &path);
    
        ~Shaders();
    private:
        Device &mDevice;
        std::unordered_map<std::string, VkShaderModule> mShaders;
    };
    
    #include "shaders.hpp"
    #include "System/exception.hpp"
    #include <fstream>
    
    auto readBinaryFile(std::string const &path) {
        std::ifstream is(path, std::ios::binary);
    
        if(!is.is_open())
            throw std::runtime_error("Shader : " + path + " does not found");
    
        is.seekg(0, std::ios::end);
        auto l = is.tellg();
        is.seekg(0, std::ios::beg);
    
        std::vector<char> values(l);
        is.read(&values[0], l);
    
        return values;
    }
    
    Shaders::Shaders(Device &device) :
        mDevice(device)
    {
    
    }
    
    VkShaderModule Shaders::get(const std::string &path) {
        if(mShaders.find(path) == mShaders.end()) {
            auto file = readBinaryFile(path);
            VkShaderModuleCreateInfo info;
            VkShaderModule module;
    
            info.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
            info.pNext = nullptr;
            info.flags = 0;
            info.codeSize = file.size();
            info.pCode = (uint32_t*)&file[0];
    
            vulkanCheckError(vkCreateShaderModule(mDevice, &info, nullptr, &module));
            mShaders[path] = module;
        }
    
        return mShaders[path];
    }
    
    Shaders::~Shaders() {
        for(auto &shader: mShaders)
            vkDestroyShaderModule(mDevice, shader.second, nullptr);
    }

    Pipelines

    Pipelines are objects used for dispatch (compute pipelines) or render something (graphic pipelines).

    The beginning of this part is going to be a summarize of the Vulkan’s specs.

    Descriptors

    Shaders access buffer and image resources through special variables. These variables are organized into a set of bindings. One set is described by one descriptor.

    Descriptor Set Layout

    They describe one set. One set is compound with an array of bindings. Each bindings are described by :

    1. A binding number
    2. One type : Image, uniform buffer, SSBO, …
    3. The number of values (Could be an array of textures)
    4. Stage where shader could access the binding.

    Allocation of Descriptor Sets

    They are allocated from descriptor pool objects.
    One descriptor pool object is described by a number of set allocation possible, and an array of descriptor type / count it can allocate.

    Once you have the descriptor pool, you could allocate from it sets (using both descriptor pool and descriptor set layout).
    When you destroy the pool, sets also are destroyed.

    Give buffer / image to sets

    Now, we have descriptors, but we have to tell Vulkan where shaders can get data from.

    Pipeline Layouts

    Pipeline layouts are a kind of bridge between the pipeline and descriptor sets. They let you manage push constant as well (we’ll see them in a future article).

    Implementation

    Since descriptor sets are not coupled with pipelines layout. We could separate pipeline layout and descriptor pool / sets, but currently, I prefer to keep them coupled. It is a choice, and it will maybe change in the future.

    #pragma once
    #include "System/Vulkan/Hardware/device.hpp"
    
    class PipelineLayout : Loggable, NonCopyable
    {
    public:
        PipelineLayout(Device &device);
    
        void setDescriptorSetLayouts(std::vector<VkDescriptorSetLayoutCreateInfo> &&infos);
        void setDescriptorPoolCreateInfo(VkDescriptorPoolCreateInfo const &info);
        void create();
    
        std::vector<VkDescriptorSet> const &descriptorSets() const;
    
        operator VkPipelineLayout();
    
        ~PipelineLayout();
    
    private:
        Device &mDevice;
    
        std::vector<VkDescriptorSetLayoutCreateInfo> mSetLayoutCreateInfos;
        std::vector<VkDescriptorSetLayout> mDescriptorSetLayouts;
        std::vector<VkDescriptorSet> mDescriptorSets;
        VkDescriptorPoolCreateInfo mDescriptorPoolCreateInfo;
        VkDescriptorPool mDescriptorPool = VK_NULL_HANDLE;
    
        VkPipelineLayout mLayout = VK_NULL_HANDLE;
    };
    void PipelineLayout::create() {
        VkPipelineLayoutCreateInfo info = {};
    
        // Create all set layouts
        for(auto &info : mSetLayoutCreateInfos) {
            VkDescriptorSetLayout layout;
            vulkanCheckError(vkCreateDescriptorSetLayout(mDevice, &info, nullptr, &layout));
            mDescriptorSetLayouts.emplace_back(layout);
        }
    
        // Create the descriptor pool
        if(mSetLayoutCreateInfos.size() > 0)
            vulkanCheckError(vkCreateDescriptorPool(mDevice, &mDescriptorPoolCreateInfo, nullptr, &mDescriptorPool));
    
        info.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.setLayoutCount = mDescriptorSetLayouts.size();
        info.pushConstantRangeCount = 0;
    
        if(mDescriptorSetLayouts.size() > 0)
            info.pSetLayouts = &mDescriptorSetLayouts[0];
    
        // Create the pipeline layout
        vulkanCheckError(vkCreatePipelineLayout(mDevice, &info, nullptr, &mLayout));
    
        if(mDescriptorSetLayouts.size()) {
            VkDescriptorSetAllocateInfo alloc = {};
    
            alloc.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
            alloc.pNext = nullptr;
            alloc.descriptorPool = mDescriptorPool;
            alloc.descriptorSetCount = mDescriptorSetLayouts.size();
            alloc.pSetLayouts = &mDescriptorSetLayouts[0];
            mDescriptorSets.resize(mDescriptorSetLayouts.size());
            vulkanCheckError(vkAllocateDescriptorSets(mDevice, &alloc, &mDescriptorSets[0]));
        }
    }
    
    std::vector<VkDescriptorSet> const &PipelineLayout::descriptorSets() const {
        return mDescriptorSets;
    }
    
    PipelineLayout::~PipelineLayout() {
        for(auto &layout : mDescriptorSetLayouts)
            vkDestroyDescriptorSetLayout(mDevice, layout, nullptr);
    
        vkDestroyDescriptorPool(mDevice, mDescriptorPool, nullptr);
        vkDestroyPipelineLayout(mDevice, mLayout, nullptr);
    }

    The idea is quite easy. You create all your descriptor set layouts, then you allocate them through a pool.

    Graphics Pipelines in a nutshell

    Graphics Pipelines describe exactly what will happened on the rendering part.
    They describe

    1. Shader stages
    2. Which kind of data you want to deal with (Position, normal,…)
    3. Which kind of primitive you want to draw (triangle, lines, points)
    4. Which operator you want to use for Stencil and Depth
    5. Multi sampling, color blending,…

    The creation of a Graphic Pipeline is really easy, the main difficulty is the configuration.

    void Pipeline::create() {
        VkGraphicsPipelineCreateInfo info = {};
    
        info.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = mFlags;
    
        info.stageCount = mStages.size();
        info.pStages = &mStages[0];
        info.pVertexInputState = &mVertexInputState;
        info.pInputAssemblyState = &mInputAssemblyState;
        info.pTessellationState = mTesselationState.get();
        info.pViewportState = mViewportState.get();
        info.pRasterizationState = &mRasterizationState;
        info.pMultisampleState = mMultisampleState.get();
        info.pDepthStencilState = mDepthStencilState.get();
        info.pColorBlendState = mColorBlendState.get();
        info.pDynamicState = mDynamicState.get();
    
        if(mLayout != nullptr)
            info.layout = *mLayout;
    
        info.renderPass = mRenderPass;
        info.subpass = mSubpass;
    
        vulkanCheckError(vkCreateGraphicsPipelines(mDevice, VK_NULL_HANDLE, 1, &info, nullptr, &mPipeline));
    }

    I used a kind of builder design pattern to configure pipelines.

    For the example, I configure my pipeline as follows :

    1. 2 stages : vertex shader and fragment shader
    2. Position 4D (x, y, z, w)
    3. No depth / stencil test
    4. An uniform buffer for one color

    This code is a bit long, but it gives all the steps you have to follow to create simple pipelines.

    std::unique_ptr<Pipeline> GBufferPipelineBuilder::build(Context &context,
                                                            RenderPass &renderpass, uint32_t subpass) {
        VkRect2D scissor;
        scissor.offset.x = scissor.offset.y = 0;
        scissor.extent.height = context.surfaceWindow().height();
        scissor.extent.width = context.surfaceWindow().width();
    
        VkViewport vp;
        vp.height = context.surfaceWindow().height();
        vp.width = context.surfaceWindow().width();
        vp.minDepth = 0.0f;
        vp.maxDepth = 1.0f;
        vp.x = vp.y = 0;
    
        VkPipelineViewportStateCreateInfo viewPort;
        viewPort.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
        viewPort.flags = 0;
        viewPort.pNext = nullptr;
        viewPort.scissorCount = viewPort.viewportCount = 1;
        viewPort.pViewports = &vp;
        viewPort.pScissors = &scissor;
    
        // 2 stages, vertex and fragment
        std::vector<VkPipelineShaderStageCreateInfo> stages(2);
        for(auto &stage : stages) {
            stage.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
            stage.pNext = nullptr;
            stage.flags = 0;
            stage.pSpecializationInfo = nullptr;
        }
    
        stages[0].stage = VK_SHADER_STAGE_VERTEX_BIT;
        stages[0].module = context.shader("../Shader/vert.spv");
        stages[0].pName = "main";
    
        stages[1].stage = VK_SHADER_STAGE_FRAGMENT_BIT;
        stages[1].module = context.shader("../Shader/frag.spv");
        stages[1].pName = "main";
    
        // Values are float4
        VkVertexInputAttributeDescription attribute[1];
        attribute[0].location = 0;
        attribute[0].binding = 0;
        attribute[0].offset = 0;
        attribute[0].format = VK_FORMAT_R32G32B32A32_SFLOAT;
    
        VkVertexInputBindingDescription binding[1];
        binding[0].binding = 0;
        binding[0].stride = 4 * sizeof(float);
        binding[0].inputRate = VK_VERTEX_INPUT_RATE_VERTEX;
    
        VkPipelineVertexInputStateCreateInfo vertexInput = {};
        vertexInput.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO;
        vertexInput.pNext = nullptr;
        vertexInput.flags = 0;
        vertexInput.vertexAttributeDescriptionCount = 1;
        vertexInput.vertexBindingDescriptionCount = 1;
        vertexInput.pVertexAttributeDescriptions = attribute;
        vertexInput.pVertexBindingDescriptions = binding;
    
        // No really MSAA
        VkPipelineMultisampleStateCreateInfo multisample = {};
        multisample.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO;
        multisample.pNext = nullptr;
        multisample.flags = 0;
        multisample.pSampleMask = nullptr;
        multisample.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;
        multisample.sampleShadingEnable = VK_FALSE;
        multisample.alphaToCoverageEnable = VK_FALSE;
        multisample.alphaToOneEnable = VK_FALSE;
    
        // DepthStencil tests disabled
        VkPipelineDepthStencilStateCreateInfo depthStencil = {};
        depthStencil.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO;
        depthStencil.pNext = nullptr;
        depthStencil.flags = 0;
        depthStencil.depthTestEnable = VK_FALSE;
        depthStencil.depthWriteEnable = VK_FALSE;
        depthStencil.depthBoundsTestEnable = VK_FALSE;
        depthStencil.depthCompareOp = VK_COMPARE_OP_ALWAYS;
        depthStencil.stencilTestEnable = VK_FALSE;
    
        // We write all r, g, b, a values
        VkPipelineColorBlendStateCreateInfo colorBlend = {};
        VkPipelineColorBlendAttachmentState cbstate[1] = {};
        cbstate[0].colorWriteMask = VK_COLOR_COMPONENT_A_BIT | VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_G_BIT | VK_COLOR_COMPONENT_R_BIT;
        colorBlend.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO;
        colorBlend.pNext = nullptr;
        colorBlend.flags = 0;
        colorBlend.logicOpEnable = VK_FALSE;
        colorBlend.attachmentCount = 1;
        colorBlend.pAttachments = cbstate;
    
        std::unique_ptr<PipelineLayout> layout = std::make_unique<PipelineLayout>(context.device());
    
        // 1 set
        VkDescriptorSetLayoutCreateInfo setLayout = {};
        setLayout.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
        setLayout.pNext = nullptr;
        setLayout.flags = 0;
        setLayout.bindingCount = 1;
    
        // 1 binding for uniform buffer
        VkDescriptorSetLayoutBinding descriptorBinding = {};
        descriptorBinding.binding = 0;
        descriptorBinding.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
        descriptorBinding.descriptorCount = 1;
        descriptorBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
        descriptorBinding.pImmutableSamplers = nullptr;
        setLayout.pBindings = &descriptorBinding;
    
        // Pool for one and only one set
        VkDescriptorPoolCreateInfo descriptorPoolInfo;
        descriptorPoolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
        descriptorPoolInfo.flags = 0;
        descriptorPoolInfo.pNext = nullptr;
        descriptorPoolInfo.maxSets = 1;
        descriptorPoolInfo.poolSizeCount = 1;
        VkDescriptorPoolSize poolSize;
        poolSize.descriptorCount = 1;
        poolSize.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
        descriptorPoolInfo.pPoolSizes = &poolSize;
    
        layout->setDescriptorSetLayouts({setLayout});
        layout->setDescriptorPoolCreateInfo(descriptorPoolInfo);
    
        layout->create();
    
        std::unique_ptr<Pipeline> pipeline;
    
        pipeline = std::make_unique<Pipeline>(context.device(), renderpass, subpass);
        pipeline->setVertexInputState(vertexInput);
        pipeline->setStages(std::move(stages));
        pipeline->setViewportState(std::make_unique<VkPipelineViewportStateCreateInfo>(viewPort));
        pipeline->setMultiSampleState(std::make_unique<VkPipelineMultisampleStateCreateInfo>(multisample));
        pipeline->setDepthStencilState(std::make_unique<VkPipelineDepthStencilStateCreateInfo>(depthStencil));
        pipeline->setColorBlendState(std::make_unique<VkPipelineColorBlendStateCreateInfo>(colorBlend));
        pipeline->setLayout(std::move(layout));
    
        pipeline->create();
    
        // Create an uniform buffer
        std::unique_ptr<Buffer> bufUniform = std::make_unique<Buffer>
                (context.device(), context.memoryPool(),VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT,
                 4 * sizeof(float));
    
        VkDescriptorBufferInfo infoBuffer;
        infoBuffer.buffer = *bufUniform;
        infoBuffer.offset = 0;
        infoBuffer.range = VK_WHOLE_SIZE;
    
        pipeline->addBuffer(std::move(bufUniform));
    
        // "Give" the buffer to the set.
        pipeline->updateBufferDescriptorSets(0, 0,
                                             VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
                                             infoBuffer);
    
        return pipeline;
    }

    Pipelines and descriptor sets give you an unmatched flexibility.

    The main.cpp is this one

    #include "Engine/context.hpp"
    #include "System/exception.hpp"
    #include "cstring"
    #include "System/Vulkan/Pipeline/commandpool.hpp"
    #include "System/Vulkan/Pipeline/builder/gbufferpipelinebuilder.hpp"
    #include "System/Vulkan/Synchronisation/fence.hpp"
    #include "System/Vulkan/Memory/buffer.hpp"
    
    void init(Context &context, CommandPool &commandPool, std::unique_ptr<Pipeline> &pipeline, Buffer &buf) {
        GBufferPipelineBuilder builder;
        // Build the pipeline
        pipeline = builder.build(context, context.surfaceWindow().renderPass(), 0);
        commandPool.reset();
    
        VkClearValue value;
        value.color.float32[0] = 0.;
        value.color.float32[1] = 0.;
        value.color.float32[2] = 0.;
        value.color.float32[3] = 1.;
    
        std::vector<VkMemoryBarrier> memoryBarrier;
        std::vector<VkBufferMemoryBarrier> bufferBarrier;
        std::vector<VkImageMemoryBarrier> imageBarrier(1);
    
        VkImageSubresourceRange range;
    
        range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
        range.baseArrayLayer = 0;
        range.baseMipLevel = 0;
        range.layerCount = 1;
        range.levelCount = 1;
    
        // obvious value for imageBarrier
        imageBarrier[0].sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
        imageBarrier[0].pNext = nullptr;
        imageBarrier[0].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        imageBarrier[0].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        imageBarrier[0].subresourceRange = range;
    
        float color[] = {1.0, 0.0, 1.0, 1.0};
        memcpy(pipeline->buffer(0).map<float>(), color, sizeof color);
    
        for(int i = 0; i < 4; ++i) {
            commandPool.allocateCommandBuffer();
            commandPool.beginCommandBuffer(i);
    
            imageBarrier[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
            imageBarrier[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
            imageBarrier[0].oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
            imageBarrier[0].newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
            imageBarrier[0].image = context.surfaceWindow().image(i);
            commandPool.commandBarrier(i,
                                       VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
                                       VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
                                       VK_FALSE, memoryBarrier, bufferBarrier, imageBarrier);
    
            commandPool.beginRenderPass(i, context.surfaceWindow().frameBuffer(i), context.surfaceWindow().renderPass(), {value});
    
            VkBuffer bufs[] = {buf};
            VkDeviceSize sizes[] = {0};
    
            pipeline->bind(*commandPool.commandBuffer(i));
            vkCmdBindVertexBuffers(*commandPool.commandBuffer(i), 0, 1, bufs, sizes);
            pipeline->bindDescriptorSets(*commandPool.commandBuffer(i));
            vkCmdDraw(*commandPool.commandBuffer(i), 3, 1, 0, 0);
    
            commandPool.endRenderPass(i);
    
            imageBarrier[0].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
            imageBarrier[0].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
            imageBarrier[0].oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
            imageBarrier[0].newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
            imageBarrier[0].image = context.surfaceWindow().image(i);
    
            commandPool.commandBarrier(i,
                                       VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                                       VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
                                       VK_FALSE, memoryBarrier, bufferBarrier, imageBarrier);
    
            commandPool.endCommandBuffer(i);
        }
    }
    
    void mainLoop(Context &context) {
        Fence fence(context.device(), 1);
        CommandPool commandPool(context.device(), 0);
        std::unique_ptr<Pipeline> pipeline;
    
        // Un triangle
        float vertices[] = {-0.5, -0.5, 1, 1,
                            0.5, -0.5, 1, 1,
                            0.0, 0.5, 1, 1};
    
        // Triangle to buffer
        Buffer buf(context.device(), context.memoryPool(), VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, sizeof(vertices));
        memcpy(buf.map<float>(), vertices, sizeof(vertices));
    
        while(context.surfaceWindow().isRunning()) {
            context.surfaceWindow().updateEvent();
            if(context.surfaceWindow().neetToInit()) {
                init(context, commandPool, pipeline, buf);
                std::cout << "Initialisation" << std::endl;
                context.surfaceWindow().initDone();
            }
            context.surfaceWindow().begin();
            fence.reset(0);
    
            context.queue().submit(commandPool.commandBuffer(context.surfaceWindow().currentSwapImage()), 1, *fence.fence(0));
            fence.wait();
            context.surfaceWindow().end(context.queue());
        }
    }
    
    int main()
    {
        Context c(true);
    
        mainLoop(c);
    
        glfwTerminate();
    
        return 0;
    }

    And now, we have our perfect triangle !!!!

    Triangle using pipelines, shaders
    Triangle using pipelines

    Barrier and explanations for the main

    I am going to explain quickly what memory barriers are.
    The idea behind the memory barrier is ensured writes are performed.
    When you performed one compute or one render, it is your duty to ensure that data will be visible when you want to re-use them.

    In our main.cpp example, I draw a triangle into a frame buffer and present it.

    The first barrier is :

            imageBarrier[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
            imageBarrier[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
            imageBarrier[0].oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
            imageBarrier[0].newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
            imageBarrier[0].image = context.surfaceWindow().image(i);
            commandPool.commandBarrier(i,
                                       VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
                                       VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
                                       VK_FALSE, memoryBarrier, bufferBarrier, imageBarrier);

    Image barriers are compound with access, layout, and pipeline barrier with stage.
    Since the presentation is a read of a framebuffer, srcAccessMask is VK_ACCESS_MEMORY_READ_BIT.
    Now, we want to render inside this image via a framebuffer, so dstAccessMask is VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT.

    We were presented the image, and now we want to render inside it, so, layouts are obvious.
    When we submit image memory barrier to the command buffer, we have to tell it which stages are affected. Here, we wait for all commands and we begin for the first stage of the pipeline.

    The second image memory barrier is

    imageBarrier[0].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
            imageBarrier[0].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
            imageBarrier[0].oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
            imageBarrier[0].newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
            imageBarrier[0].image = context.surfaceWindow().image(i);
    
            commandPool.commandBarrier(i,
                                       VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                                       VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
                                       VK_FALSE, memoryBarrier, bufferBarrier, imageBarrier);

    The only difference is the order and stageMasks. Here we wait for the color attachement (and not the Fragment one !!!!) and we begin with the end of the stages (It is not really easy to explain… but it does not sound not logic).

    Steps to render something using pipelines are:

    1. Create pipelines
    2. Create command pools, command buffer and begin them
    3. Create vertex / index buffers
    4. Bind pipelines to their subpass, bind buffers and descriptor sets
    5. VkCmdDraw

    References

    Specification

    It was a long article, I hope it was not unclear and that I didn’t do to much mistakes ^^.

    Kiss !!!!

  • Lava erupting from Vulkan : Initialization or Hello World

    Hi there !
    A Few weeks ago, February 16th to be precise, Vulkan, the new graphic API from Khronos was released. It is a new API which gives much more control about the GPUs than OpenGL (API I loved before Vulkan ^_^).

    OpenGL’s problems

    Driver Overhead

    Fast rendering problems could be from the driver, video games don’t use perfectly the GPU (maybe 80% instead of 95-100% of use). Driver overheads have big costs and more recent OpenGL version tend to solve this problem with Bindless Textures, multi draws, direct state access, etc.
    Keep in mind that each GPU calls could have a big cost.
    Cass Everitt, Tim Foley, John McDonald, Graham Sellers presented Approaching Zero Driver Overhead with OpenGL in 2014.

    Multi threading

    With OpenGL, it is not possible to have an efficient multi threading, because an OpenGL context is for one and only one thread that is why it is not so easy to make a draw call from another thread ^_^.

    Vulkan

    Vulkan is not really a low level API, but it provides a far better abstraction for moderns hardwares. Vulkan is more than AZDO, it is, as Graham Sellers said, PDCTZO (Pretty Darn Close To Zero Overhead).

    Series of articles about Lava

    What is Lava ?

    Lava is the name I gave to my new graphic (physics?) engine. It will let me learn how Vulkan work, play with it, implement some global illumination algorithms, and probably share with you my learnings and feelings about Vulkan. It is possible that I’ll make some mistakes, so, If I do, please let me know !

    Why Lava ?

    Vulkan makes me think about Volcano that make me think about Lava, so… I chose it 😀 .

    Initialization

    Now begins what I wanted to discuss, initialization of Vulkan.
    First of all, you have to really know and understand what you will attend to do. For the beginning, we are going to see how to have a simple pink window.

    Hello world with Vulkan
    Hello world with Vulkan

    When you are developing with Vulkan, I advise you to have specifications from Khronos on another window (or screen if you are using multiple screens).
    To have an easier way to manage windows, I am using GLFW 3.2, and yes, you are mandatory to compile it yourself ^_^, but it is not difficult at all, so it is not a big deal.

    Instance

    Contrary to OpenGL, in Vulkan, there is no global state, an instance could be similar to an OpenGL Context. An instance doesn’t know anything about other instances, is utterly isolate. The creation of an instance is really easy.

    Instance::Instance(unsigned int nExtensions, const char * const *extensions) {
        VkInstanceCreateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.pApplicationInfo = nullptr;
        info.enabledLayerCount = 0;
        info.ppEnabledLayerNames = nullptr;
        info.enabledExtensionCount = nExtensions;
        info.ppEnabledExtensionNames = extensions;
    
        vulkanCheckError(vkCreateInstance(&info, nullptr, &mInstance));
    }

    Physical devices, devices and queues

    From this Instance, you could retrieve all GPUs on your computer.
    You could create a connection between your application and the GPU you want using a VkDevice.
    Creating this connection, you have to create as well queues.
    Queues are used to perform tasks, you submit the task to a queue and it will be performed.
    The queues are separated between several families.
    A good way could be use several queues, for example, one for the physics and one for the graphics (or even 2 or three for this last).
    You could as well give a priority (between 0 and 1) to a queue. Thanks to that, if you consider a task not so important, you just have to give to the used queue a low priority :).

    Device::Device(const PhysicalDevices &physicalDevices, unsigned i, std::vector<float> const &priorities, unsigned nQueuePerFamily) {
        VkDeviceCreateInfo info;
        std::vector<VkDeviceQueueCreateInfo> infoQueue;
    
        mPhysicalDevice = physicalDevices[i];
    
        infoQueue.resize(physicalDevices.queueFamilyProperties(i).size());
    
        info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.queueCreateInfoCount = infoQueue.size();
        info.pQueueCreateInfos = &infoQueue[0];
        info.enabledExtensionCount = info.enabledLayerCount = 0;
        info.pEnabledFeatures = &physicalDevices.features(i);
    
        for(auto j(0u); j < infoQueue.size(); ++j) {
            infoQueue[j].sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
            infoQueue[j].pNext = nullptr;
            infoQueue[j].flags = 0;
            infoQueue[j].pQueuePriorities = &priorities[j];
            infoQueue[j].queueCount = std::min(nQueuePerFamily, physicalDevices.queueFamilyProperties(i)[j].queueCount);
            infoQueue[j].queueFamilyIndex = j;
        }
    
        vulkanCheckError(vkCreateDevice(physicalDevices[i], &info, nullptr, &mDevice));
    }
    

    Image, ImageViews and FrameBuffers

    The images represent a mono or multi dimensional array (1D, 2D or 3D).
    The images don’t give any get or set for data. If you want to use them in your application, then you must use ImageViews.

    ImageViews are directly relied to an image. The creation of an ImageView is not really complicated.

    ImageView::ImageView(Device &device, Image image, VkFormat format, VkImageViewType viewType, VkImageSubresourceRange const &subResourceRange) :
        mDevice(device), mImage(image) {
        VkImageViewCreateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.image = image;
        info.viewType = viewType;
        info.format = format;
        info.components.r = VK_COMPONENT_SWIZZLE_R;
        info.components.g = VK_COMPONENT_SWIZZLE_G;
        info.components.b = VK_COMPONENT_SWIZZLE_B;
        info.components.a = VK_COMPONENT_SWIZZLE_A;
        info.subresourceRange = subResourceRange;
    
        vulkanCheckError(vkCreateImageView(device, &info, nullptr, &mImageView));
    }

    You could write into ImageViews via FrameBuffers. A FrameBuffer owns multiple imageViews (attachments) and is used to write into them.

    FrameBuffer::FrameBuffer(Device &device, RenderPass &renderPass,
                             std::vector<ImageView> &&imageViews,
                             uint32_t width, uint32_t height, uint32_t layers)
        : mDevice(device), mRenderPass(renderPass),
          mImageViews(std::move(imageViews)),
          mWidth(width), mHeight(height), mLayers(layers){
        VkFramebufferCreateInfo info;
    
        std::vector<VkImageView> views(mImageViews.size());
    
        for(auto i(0u); i < views.size(); ++i)
            views[i] = mImageViews[i];
    
        info.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.renderPass = renderPass;
        info.attachmentCount = views.size();
        info.pAttachments = &views[0];
        info.width = width;
        info.height = height;
        info.layers = layers;
    
        vulkanCheckError(vkCreateFramebuffer(mDevice, &info, nullptr, &mFrameBuffer));
    }

    The way to render something

    A window is assigned to a Surface (VkSurfaceKHR). To draw something, you have to render into this surface via swapchains.

    From notions of Swapchains

    In Vulkan, you have to manage the double buffering by yourself via Swapchain. When you create a swapchain, you link it to a Surface and tell it how many images you need. For a double buffering, you need 2 images.

    Once the swapchain was created, you should retrieve images and create frame buffers using them.

    The steps to have a correct swapchain is :

    1. Create a Window
    2. Create a Surface assigned to this Window
    3. Create a Swapchain with several images assigned to this Surface
    4. Create FrameBuffers using all of these images.
    vulkanCheckError(glfwCreateWindowSurface(instance, mWindow, nullptr, &mSurface));
    
    void SurfaceWindow::createSwapchain() {
        VkSwapchainCreateInfoKHR info;
    
        uint32_t nFormat;
        vkGetPhysicalDeviceSurfaceFormatsKHR(mDevice, mSurface, &nFormat, nullptr);
        std::vector<VkSurfaceFormatKHR> formats(nFormat);
        vkGetPhysicalDeviceSurfaceFormatsKHR(mDevice, mSurface, &nFormat, &formats[0]);
    
        if(nFormat == 1 && formats[0].format == VK_FORMAT_UNDEFINED)
            formats[0].format = VK_FORMAT_B8G8R8A8_SRGB;
    
        mFormat = formats[0].format;
        mRenderPass = std::make_unique<RenderPass>(mDevice, mFormat);
    
        info.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
        info.pNext = nullptr;
        info.flags = 0;
        info.imageFormat = formats[0].format;
        info.imageColorSpace = formats[0].colorSpace;
        info.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
        info.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
        info.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
        info.compositeAlpha = VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR;
        info.presentMode = VK_PRESENT_MODE_MAILBOX_KHR;
        info.surface = mSurface;
        info.minImageCount = 2; // Double buffering...
        info.imageExtent.width = mWidth;
        info.imageExtent.height = mHeight;
    
        vulkanCheckError(vkCreateSwapchainKHR(mDevice, &info, nullptr, &mSwapchain));
        initFrameBuffers();
    }
    void SurfaceWindow::initFrameBuffers() {
        VkImage images[2];
        uint32_t nImg = 2;
    
        vkGetSwapchainImagesKHR(mDevice, mSwapchain, &nImg, images);
    
        for(auto i(0u); i < nImg; ++i) {
            std::vector<ImageView> allViews;
            allViews.emplace_back(mDevice, images[i], mFormat);
            mFrameBuffers[i] = std::make_unique<FrameBuffer>(mDevice, *mRenderPass, std::move(allViews), mWidth, mHeight, 1);
        }
    }

    Using swapchain is not difficult.

    1. Acquire the new image index
    2. Present queue
    void SurfaceWindow::begin() {
        // No checking because could be in lost state if change res
        vkAcquireNextImageKHR(mDevice, mSwapchain, UINT64_MAX, VK_NULL_HANDLE, VK_NULL_HANDLE, &mCurrentSwapImage);
    }
    
    void SurfaceWindow::end(Queue &queue) {
        VkPresentInfoKHR info;
    
        info.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
        info.pNext = nullptr;
        info.waitSemaphoreCount = 0;
        info.pWaitSemaphores = nullptr;
        info.swapchainCount = 1;
        info.pSwapchains = &mSwapchain;
        info.pImageIndices = &mCurrentSwapImage;
        info.pResults = nullptr;
    
        vkQueuePresentKHR(queue, &info);
    }

    To notions of Render Pass

    Right now, Vulkan should be initialized. To render something, we have to use render pass, and command buffer.

    Command Buffers

    Command buffer is quite similar to vertex array object (VAO) or display list (old old old OpenGL 😀 ).
    You begin the recorded state, you record some “information” and you end the recorded state.
    Command buffers are allocated from the CommandPool.

    Vulkan provides two types of Command Buffer.

    1. Primary level : They should be submitted within a queue.
    2. Secondary level : They should be executed by a primary level command buffer.
    std::size_t CommandPool::allocateCommandBuffer() {
        VkCommandBuffer cmd;
        VkCommandBufferAllocateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
        info.pNext = nullptr;
        info.commandPool = mCommandPool;
        info.commandBufferCount = 1;
        info.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    
        vulkanCheckError(vkAllocateCommandBuffers(mDevice, &info, &cmd));
    
        mCommandBuffers.emplace_back(cmd);
        return mCommandBuffers.size() - 1;
    }

    Renderpass

    One render pass is executed on one framebuffer. The creation is not easy at all. One render pass is componed with one or several subpasses.
    I remind that framebuffers could have several attachments.
    Each attachment are not mandatory to be used for all subpasses.

    This piece of code to create one renderpass is not definitive at all and will be changed as soon as possible ^^. But for our example, it is correct.

    RenderPass::RenderPass(Device &device, VkFormat format) :
        mDevice(device)
    {
        VkRenderPassCreateInfo info;
        VkAttachmentDescription attachmentDescription;
        VkSubpassDescription subpassDescription;
        VkAttachmentReference attachmentReference;
    
        attachmentReference.attachment = 0;
        attachmentReference.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    
        attachmentDescription.flags = VK_ATTACHMENT_DESCRIPTION_MAY_ALIAS_BIT;
        attachmentDescription.format = format;
        attachmentDescription.samples = VK_SAMPLE_COUNT_1_BIT;
        attachmentDescription.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
        attachmentDescription.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
        attachmentDescription.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
        attachmentDescription.stencilStoreOp = VK_ATTACHMENT_STORE_OP_STORE;
        attachmentDescription.initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
        attachmentDescription.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    
        subpassDescription.flags = 0;
        subpassDescription.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
        subpassDescription.inputAttachmentCount = 0;
        subpassDescription.colorAttachmentCount = 1;
        subpassDescription.pColorAttachments = &attachmentReference;
        subpassDescription.pResolveAttachments = nullptr;
        subpassDescription.pDepthStencilAttachment = nullptr;
        subpassDescription.preserveAttachmentCount = 0;
        subpassDescription.pPreserveAttachments = nullptr;
    
        info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.attachmentCount = 1;
        info.pAttachments = &attachmentDescription;
        info.subpassCount = 1;
        info.pSubpasses = &subpassDescription;
        info.dependencyCount = 0;
        info.pDependencies = nullptr;
    
        vulkanCheckError(vkCreateRenderPass(mDevice, &info, nullptr, &mRenderPass));
    }

    In the same way as for command buffer, render pass should be began and ended!

    void CommandPool::beginRenderPass(std::size_t index,
                                      FrameBuffer &frameBuffer,
                                      const std::vector<VkClearValue> &clearValues) {
        assert(index < mCommandBuffers.size());
        VkRenderPassBeginInfo info;
        VkRect2D area;
    
        area.offset = VkOffset2D{0, 0};
        area.extent = VkExtent2D{frameBuffer.width(), frameBuffer.height()};
    
        info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
        info.pNext = nullptr;
        info.renderPass = frameBuffer.renderPass();
        info.framebuffer = frameBuffer;
        info.renderArea = area;
        info.clearValueCount = clearValues.size();
        info.pClearValues = &clearValues[0];
    
        vkCmdBeginRenderPass(mCommandBuffers[index], &info, VK_SUBPASS_CONTENTS_INLINE);
    }
    

    Our engine in action

    Actually, our “engine” is not really usable ^^.
    But in the future, command pool, render pass should don’t appear in the user files !

    #include "System/contextinitializer.hpp"
    #include "System/Vulkan/instance.hpp"
    #include "System/Vulkan/physicaldevices.hpp"
    #include "System/Vulkan/device.hpp"
    #include "System/Vulkan/queue.hpp"
    #include "System/surfacewindow.hpp"
    #include "System/Vulkan/exception.hpp"
    #include "System/Vulkan/commandpool.hpp"
    #include "System/Vulkan/fence.hpp"
    
    void init(CommandPool &commandPool, SurfaceWindow &window) {
        commandPool.reset();
    
        VkClearValue value;
        value.color.float32[0] = 0.8;
        value.color.float32[1] = 0.2;
        value.color.float32[2] = 0.2;
        value.color.float32[3] = 1;
    
        for(int i = 0; i < 2; ++i) {
            commandPool.allocateCommandBuffer();
            commandPool.beginCommandBuffer(i);
            commandPool.beginRenderPass(i, window.frameBuffer(i), {value});
            commandPool.endRenderPass(i);
            commandPool.endCommandBuffer(i);
        }
        commandPool.allocateCommandBuffer();
    }
    
    void mainLoop(SurfaceWindow &window, Device &device, Queue &queue) {
        Fence fence(device, 1);
        CommandPool commandPool(device, 0);
    
        while(window.isRunning()) {
            window.updateEvent();
            if(window.neetToInit()) {
                init(commandPool, window);
                std::cout << "Initialisation" << std::endl;
                window.initDone();
            }
            window.begin();
            queue.submit(commandPool.commandBuffer(window.currentSwapImage()), 1, *fence.fence(0));
            fence.wait();
            window.end(queue);
        }
    }
    
    int main()
    {
        ContextInitializer context;
        Instance instance(context.extensionNumber(), context.extensions());
        PhysicalDevices physicalDevices(instance);
        Device device(physicalDevices, 0, {1.f}, 1);
        Queue queue(device, 0, 0);
    
        SurfaceWindow window(instance, device, 800, 600, "Lava");
    
        mainLoop(window, device, queue);
    
        glfwTerminate();
    
        return 0;
    }

    If you want the whole source code :
    GitHub

    Reference

    Approaching Zero Driver Overhead :Lecture
    Approaching Zero Driver Overhead : Slides
    Vulkan Overview 2015
    Vulkan in 30 minutes
    VkCube
    GLFW with Vulkan