Tag: memory

  • Buffer management with Vulkan : transfer, Staging buffer

    Hi guys !
    I keep my promise and I am coming with explanations and implementation on how to use and manage one (or several) buffer in Vulkan application.

    How I manage my resources ?

    shared_ptr?

    Firstly, I have a way to manage Vulkan resource that is a bit weird. The idea is to “emulate” the behaviour of shared_ptr and enable copy / move.
    So, if you do that :

    Image b1(); // count = 1
    Image b2 = b1; // count = 2

    b1 and b2 are exactly the same Vulkan’s Image.

    A counter?

    To emulate the behaviour of a shared_ptr, I created one class that is simply a Counter.

    class Counter
    {
    public:
        Counter() = default;
        Counter(Counter const &counter);
        Counter(Counter &&counter) = default;
        Counter &operator=(Counter counter);
    
        uint32_t getCount() const;
    
        virtual ~Counter();
    protected:
        std::shared_ptr<uint32_t> mCount = std::make_shared<uint32_t>(1);
    };
    
    Counter::Counter(const Counter &counter) :
        mCount(counter.mCount) {
        ++(*mCount);
    }
    
    Counter &Counter::operator =(Counter counter) {
        using std::swap;
        swap(mCount, counter.mCount);
        return *this;
    }
    
    uint32_t Counter::getCount() const {
        return *mCount;
    }
    
    Counter::~Counter() {
    
    }
    

    A Vulkan Resource

    A Vulkan resource lives through a device. So I wrote this little class that represents a Vulkan Resource :

    class VkResource : public Counter
    {
    public:
        VkResource() = default;
        VkResource(Device const &device);
        VkResource(VkResource &&vkResource) = default;
        VkResource(VkResource const &vkResource) = default;
        VkResource &operator=(VkResource &&vkResource) = default;
        VkResource &operator=(VkResource const &vkResource) = default;
    
        vk::Device getDevice() const;
    
    protected:
        std::shared_ptr<Device> mDevice;
    };
    
    
    VkResource::VkResource(const Device &device) :
        mDevice(std::make_shared<Device>(device)) {
    
    }
    
    vk::Device VkResource::getDevice() const {
        return *mDevice;
    }
    

    Buffer in Vulkan

    Unlike OpenGL, buffers in Vulkan are separated from memory. You must bind the memory to them. Since you can choose if you want the memory on the device_local heap or in the host_visible, you can chose which heap your buffer will use.

    So what are buffer made with ?

    Buffers are made with a size, a usage (vertex? uniform ?), one block of memory, one ptr if the buffer is HOST_VISIBLE etc.
    My buffer class is :

    class Buffer : public VkResource, public vk::Buffer
    {
    public:
        Buffer() = default;
    
        Buffer(Device &device, vk::BufferUsageFlags usage, vk::DeviceSize size,
               std::shared_ptr<AbstractAllocator> allocator, bool shouldBeDeviceLocal);
    
        Buffer(Buffer &&buffer) = default;
        Buffer(Buffer const &buffer) = default;
        Buffer &operator=(Buffer const &buffer);
    
        vk::DeviceSize getSize() const;
        vk::BufferUsageFlags getUsage() const;
        bool isDeviceLocal() const;
        void *getPtr();
        std::shared_ptr<AbstractAllocator> getAllocator();
    
        ~Buffer();
    
    private:
        std::shared_ptr<AbstractAllocator> mAllocator;
        std::shared_ptr<vk::DeviceSize> mSize = std::make_shared<vk::DeviceSize>();
        std::shared_ptr<vk::BufferUsageFlags> mUsage = std::make_shared<vk::BufferUsageFlags>();
        std::shared_ptr<vk::MemoryRequirements> mRequirements = std::make_shared<vk::MemoryRequirements>();
        std::shared_ptr<vk::PhysicalDeviceMemoryProperties> mProperties = std::make_shared<vk::PhysicalDeviceMemoryProperties>();
        std::shared_ptr<Block> mBlock = std::make_shared<Block>();
        std::shared_ptr<bool> mIsDeviceLocal;
        std::shared_ptr<void *> mPtr = std::make_shared<void *>(nullptr);
    
        void createBuffer();
        void allocate(bool shouldBeDeviceLocal);
    };
    

    It may be is a bit complicate, but it is not really that difficult. A buffer will be created with an usage, one size and one boolean to put or not this buffer in device_local memory.
    The creation of the buffer is quite simple. You just have to give the size and the usage :

    vk::BufferCreateInfo createInfo(vk::BufferCreateFlags(),
                                    *mSize,
                                    *mUsage,
                                     vk::SharingMode::eExclusive);
    
    m_buffer = mDevice->createBuffer(createInfo);
    *mRequirements = mDevice->getBufferMemoryRequirements(m_buffer);

    The last line is to get the memory requirements. It will give you the real size you need (padding or other things) and list of memory types that can be used with the buffer.
    To get the memory type index, I developed this function which cares about device local memory or host visible memory :

    int findMemoryType(uint32_t memoryTypeBits,
                       vk::PhysicalDeviceMemoryProperties const &properties,
                       bool shouldBeDeviceLocal) {
    
        auto lambdaGetMemoryType = [&](vk::MemoryPropertyFlags propertyFlags) -> int {
            for(uint32_t i = 0; i < properties.memoryTypeCount; ++i)
                if((memoryTypeBits & (1 << i)) &&
                ((properties.memoryTypes[i].propertyFlags & propertyFlags) == propertyFlags))
                    return i;
            return -1;
        };
    
        if(!shouldBeDeviceLocal) {
            vk::MemoryPropertyFlags optimal = vk::MemoryPropertyFlagBits::eHostCached |
                    vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            vk::MemoryPropertyFlags required = vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            int type = lambdaGetMemoryType(optimal);
            if(type == -1) {
                int result = lambdaGetMemoryType(required);
                if(result == -1)
                    assert(!"Memory type does not find");
                return result;
            }
            return type;
        }
    
        else
            return lambdaGetMemoryType(vk::MemoryPropertyFlagBits::eDeviceLocal);
    }

    This code was made with the specifications themselves.
    Now we should allocate memory for our buffers :

     int memoryTypeIndex = findMemoryType(mRequirements->memoryTypeBits, *mProperties, shouldBeDeviceLocal);
    
        *mBlock = mAllocator->allocate(mRequirements->size, mRequirements->alignment, memoryTypeIndex);
        mDevice->bindBufferMemory(m_buffer, mBlock->memory, mBlock->offset);
    
        // if host_visible, we can map it
        if(!shouldBeDeviceLocal)
            *mPtr = mDevice->mapMemory(mBlock->memory, mBlock->offset,
                                      *mSize, vk::MemoryMapFlags());

    As you can see, you allocate the memory, and you bind the memory. If the memory is host visible, you can map it.

    Now we have a class to manage our buffers. But it is not finished at all !

    Staging resources

    We cannot write directly to the device_local memory. We must use something that we call a staging resource. Staging resources can be buffers or images. The idea is to bind a host visible memory to a staging resource, and transfer the memory through the staging resource to a resource with memory that resides in device_local memory.

    staging buffer

    Command Buffers submitting

    Before to transfer anything, I wanted to have a class that manages the submitting of command buffers. When the work is done, the command submitter should notify transferer object that use it. I used an observer pattern :

    class ObserverCommandBufferSubmitter {
    public:
        virtual void notify() = 0;
    };
    
    class CommandBufferSubmitter
    {
    public:
        CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers);
    
        void addObserver(ObserverCommandBufferSubmitter *observer);
    
        vk::CommandBuffer createCommandBuffer();
    
        void submit();
        void wait();
    
    protected:
        std::shared_ptr<Device> mDevice;
        std::shared_ptr<vk::Queue> mQueue;
        std::shared_ptr<CommandPool> mCommandPool;
        std::shared_ptr<std::vector<vk::CommandBuffer>> mCommandBuffers = std::make_shared<std::vector<vk::CommandBuffer>>();
        std::shared_ptr<Fence> mFence;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
        std::shared_ptr<std::vector<ObserverCommandBufferSubmitter*>> mObservers = std::make_shared<std::vector<ObserverCommandBufferSubmitter*>>();
    };
    
    CommandBufferSubmitter::CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers) :
        mDevice(std::make_shared<Device>(device)),
        mQueue(std::make_shared<vk::Queue>(device.getTransferQueue())),
        mCommandPool(std::make_shared<CommandPool>(device, true, true, device.getIndexTransferQueue())),
        mFence(std::make_shared<Fence>(device, false)) {
        *mCommandBuffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, numberCommandBuffers);
    }
    
    void CommandBufferSubmitter::addObserver(ObserverCommandBufferSubmitter *observer) {
        mObservers->emplace_back(observer);
    }
    
    vk::CommandBuffer CommandBufferSubmitter::createCommandBuffer() {
        if(*mIndex >= mCommandBuffers->size()) {
            auto buffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, 10);
    
            for(auto &b : buffers)
                mCommandBuffers->emplace_back(b);
        }
    
        return (*mCommandBuffers)[(*mIndex)++];
    }
    
    void CommandBufferSubmitter::submit() {
        vk::SubmitInfo info;
        info.setCommandBufferCount(*mIndex).setPCommandBuffers(mCommandBuffers->data());
        mFence->reset();
        mQueue->submit(info, *mFence);
    }
    
    void CommandBufferSubmitter::wait() {
        *mIndex = 0;
        mFence->wait();
        mFence->reset();
        for(auto &observer : *mObservers)
            observer->notify();
    }
    

    The code is not difficult, it allocates if needed one command buffer and return it and use fencing to know if works are completed.

    Buffer transferer

    You guessed that our Buffer transferer must implement the abstract class :

    class BufferTransferer : public ObserverCommandBufferSubmitter
    {
    public:
        BufferTransferer(Device &device, uint32_t numberBuffers, vk::DeviceSize sizeTransfererBuffers,
                         std::shared_ptr<AbstractAllocator> allocator, CommandBufferSubmitter &commandBufferSubmitter);
    
        void transfer(const Buffer &src, Buffer &dst,
                      vk::DeviceSize offsetSrc,
                      vk::DeviceSize offsetDst,
                      vk::DeviceSize size);
    
        void transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data);
    
        void notify();
    
    private:
        std::shared_ptr<CommandBufferSubmitter> mCommandBufferSubmitter;
        std::shared_ptr<std::vector<Buffer>> mTransfererBuffers = std::make_shared<std::vector<Buffer>>();
        std::shared_ptr<uint32_t> mSizeTransfererBuffers;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
    };

    The idea is to have several buffers ready to transfer data. Why this idea? Because users may don’t care about the CPU buffer and only want a GPU Buffer ! Thanks to that, if he wants to transfer data like glBufferSubData, he actually can !

    The code to transfer a buffer is not complicated at all. However, you just have to be careful about the memory barrier. Personally, I use one from Transfer to ALL_Commands in this case.

    void BufferTransferer::notify() {
        *mIndex = 0;
    }
    
    void BufferTransferer::transfer(Buffer const &src, Buffer &dst,
                                    vk::DeviceSize offsetSrc, vk::DeviceSize offsetDst,
                                    vk::DeviceSize size) {
        // Check if size and usage are legals
        assert((src.getUsage() & vk::BufferUsageFlagBits::eTransferSrc) ==
                    vk::BufferUsageFlagBits::eTransferSrc);
        assert((dst.getUsage() & vk::BufferUsageFlagBits::eTransferDst) ==
                    vk::BufferUsageFlagBits::eTransferDst);
    
        assert(src.getSize() >= (offsetSrc + size));
        assert(dst.getSize() >= (offsetDst + size));
    
        // Prepare the region copied
        vk::BufferCopy region(offsetSrc, offsetDst, size);
    
        vk::CommandBufferBeginInfo begin(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    
        vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer();
    
        cmd.begin(begin);
        cmd.copyBuffer(src, dst, {region});
        cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                            vk::PipelineStageFlagBits::eAllCommands,
                            vk::DependencyFlags(),
                            nullptr,
                            vk::BufferMemoryBarrier(vk::AccessFlagBits::eTransferWrite,
                                                    vk::AccessFlagBits::eMemoryRead,
                                                    VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
                                                    dst, offsetSrc, size),
                            nullptr);
        cmd.end();
    
    }
    
    void BufferTransferer::transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data) {
        if(*mIndex == mTransfererBuffers->size()) {
            mCommandBufferSubmitter->submit();
            mCommandBufferSubmitter->wait();
        }
        assert(size <= *mSizeTransfererBuffers);
        memcpy((*mTransfererBuffers)[*mIndex].getPtr(), data, size);
        transfer((*mTransfererBuffers)[*mIndex], buffer, 0, offset, size);
        (*mIndex)++;
    }
    

    I do not manage the reallocation if the dst buffer is too small, or loop / recursion the transfer when our staging buffer is too small, but with our architecture, It would not be difficult to manage these cases !

    How to use it??

    Simply like that !

    CommandBufferSubmitter commandBufferSubmitter(device, 1);
    BufferTransferer bufferTransferer(device, 1, 1 << 20, deviceAllocator, commandBufferSubmitter);
    glm::vec2 quad[] = {glm::vec2(-1, -1), glm::vec2(1, -1), glm::vec2(-1, 1), glm::vec2(1, 1)};
    Buffer vbo(device, vk::BufferUsageFlagBits::eTransferDst | vk::BufferUsageFlagBits::eVertexBuffer, sizeof quad, deviceAllocator, true);
    bufferTransferer.transfer(vbo, 0, sizeof quad, quad);
    commandBufferSubmitter.submit();
    commandBufferSubmitter.wait();

    I saw that a high number of my visits comes from twitter. If you want to follow me: it is here.

    Kisses and see you soon to see how to load / manage images !