Buffer management with Vulkan : transfer, Staging buffer

Hi guys !
I keep my promise and I am coming with explanations and implementation on how to use and manage one (or several) buffer in Vulkan application.

How I manage my resources ?


Firstly, I have a way to manage Vulkan resource that is a bit weird. The idea is to “emulate” the behaviour of shared_ptr and enable copy / move.
So, if you do that :

Image b1(); // count = 1
Image b2 = b1; // count = 2

b1 and b2 are exactly the same Vulkan’s Image.

A counter?

To emulate the behaviour of a shared_ptr, I created one class that is simply a Counter.

class Counter
    Counter() = default;
    Counter(Counter const &counter);
    Counter(Counter &&counter) = default;
    Counter &operator=(Counter counter);

    uint32_t getCount() const;

    virtual ~Counter();
    std::shared_ptr<uint32_t> mCount = std::make_shared<uint32_t>(1);

Counter::Counter(const Counter &counter) :
    mCount(counter.mCount) {

Counter &Counter::operator =(Counter counter) {
    using std::swap;
    swap(mCount, counter.mCount);
    return *this;

uint32_t Counter::getCount() const {
    return *mCount;

Counter::~Counter() {


A Vulkan Resource

A Vulkan resource lives through a device. So I wrote this little class that represents a Vulkan Resource :

class VkResource : public Counter
    VkResource() = default;
    VkResource(Device const &device);
    VkResource(VkResource &&vkResource) = default;
    VkResource(VkResource const &vkResource) = default;
    VkResource &operator=(VkResource &&vkResource) = default;
    VkResource &operator=(VkResource const &vkResource) = default;

    vk::Device getDevice() const;

    std::shared_ptr<Device> mDevice;

VkResource::VkResource(const Device &device) :
    mDevice(std::make_shared<Device>(device)) {


vk::Device VkResource::getDevice() const {
    return *mDevice;

Buffer in Vulkan

Unlike OpenGL, buffers in Vulkan are separated from memory. You must bind the memory to them. Since you can choose if you want the memory on the device_local heap or in the host_visible, you can chose which heap your buffer will use.

So what are buffer made with ?

Buffers are made with a size, a usage (vertex? uniform ?), one block of memory, one ptr if the buffer is HOST_VISIBLE etc.
My buffer class is :

class Buffer : public VkResource, public vk::Buffer
    Buffer() = default;

    Buffer(Device &device, vk::BufferUsageFlags usage, vk::DeviceSize size,
           std::shared_ptr<AbstractAllocator> allocator, bool shouldBeDeviceLocal);

    Buffer(Buffer &&buffer) = default;
    Buffer(Buffer const &buffer) = default;
    Buffer &operator=(Buffer const &buffer);

    vk::DeviceSize getSize() const;
    vk::BufferUsageFlags getUsage() const;
    bool isDeviceLocal() const;
    void *getPtr();
    std::shared_ptr<AbstractAllocator> getAllocator();


    std::shared_ptr<AbstractAllocator> mAllocator;
    std::shared_ptr<vk::DeviceSize> mSize = std::make_shared<vk::DeviceSize>();
    std::shared_ptr<vk::BufferUsageFlags> mUsage = std::make_shared<vk::BufferUsageFlags>();
    std::shared_ptr<vk::MemoryRequirements> mRequirements = std::make_shared<vk::MemoryRequirements>();
    std::shared_ptr<vk::PhysicalDeviceMemoryProperties> mProperties = std::make_shared<vk::PhysicalDeviceMemoryProperties>();
    std::shared_ptr<Block> mBlock = std::make_shared<Block>();
    std::shared_ptr<bool> mIsDeviceLocal;
    std::shared_ptr<void *> mPtr = std::make_shared<void *>(nullptr);

    void createBuffer();
    void allocate(bool shouldBeDeviceLocal);

It may be is a bit complicate, but it is not really that difficult. A buffer will be created with an usage, one size and one boolean to put or not this buffer in device_local memory.
The creation of the buffer is quite simple. You just have to give the size and the usage :

vk::BufferCreateInfo createInfo(vk::BufferCreateFlags(),

m_buffer = mDevice->createBuffer(createInfo);
*mRequirements = mDevice->getBufferMemoryRequirements(m_buffer);

The last line is to get the memory requirements. It will give you the real size you need (padding or other things) and list of memory types that can be used with the buffer.
To get the memory type index, I developed this function which cares about device local memory or host visible memory :

int findMemoryType(uint32_t memoryTypeBits,
                   vk::PhysicalDeviceMemoryProperties const &properties,
                   bool shouldBeDeviceLocal) {

    auto lambdaGetMemoryType = [&](vk::MemoryPropertyFlags propertyFlags) -> int {
        for(uint32_t i = 0; i < properties.memoryTypeCount; ++i)
            if((memoryTypeBits & (1 << i)) &&
            ((properties.memoryTypes[i].propertyFlags & propertyFlags) == propertyFlags))
                return i;
        return -1;

    if(!shouldBeDeviceLocal) {
        vk::MemoryPropertyFlags optimal = vk::MemoryPropertyFlagBits::eHostCached |
                vk::MemoryPropertyFlagBits::eHostCoherent |

        vk::MemoryPropertyFlags required = vk::MemoryPropertyFlagBits::eHostCoherent |

        int type = lambdaGetMemoryType(optimal);
        if(type == -1) {
            int result = lambdaGetMemoryType(required);
            if(result == -1)
                assert(!"Memory type does not find");
            return result;
        return type;

        return lambdaGetMemoryType(vk::MemoryPropertyFlagBits::eDeviceLocal);

This code was made with the specifications themselves.
Now we should allocate memory for our buffers :

 int memoryTypeIndex = findMemoryType(mRequirements->memoryTypeBits, *mProperties, shouldBeDeviceLocal);

    *mBlock = mAllocator->allocate(mRequirements->size, mRequirements->alignment, memoryTypeIndex);
    mDevice->bindBufferMemory(m_buffer, mBlock->memory, mBlock->offset);

    // if host_visible, we can map it
        *mPtr = mDevice->mapMemory(mBlock->memory, mBlock->offset,
                                  *mSize, vk::MemoryMapFlags());

As you can see, you allocate the memory, and you bind the memory. If the memory is host visible, you can map it.

Now we have a class to manage our buffers. But it is not finished at all !

Staging resources

We cannot write directly to the device_local memory. We must use something that we call a staging resource. Staging resources can be buffers or images. The idea is to bind a host visible memory to a staging resource, and transfer the memory through the staging resource to a resource with memory that resides in device_local memory.

staging buffer

Command Buffers submitting

Before to transfer anything, I wanted to have a class that manages the submitting of command buffers. When the work is done, the command submitter should notify transferer object that use it. I used an observer pattern :

class ObserverCommandBufferSubmitter {
    virtual void notify() = 0;

class CommandBufferSubmitter
    CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers);

    void addObserver(ObserverCommandBufferSubmitter *observer);

    vk::CommandBuffer createCommandBuffer();

    void submit();
    void wait();

    std::shared_ptr<Device> mDevice;
    std::shared_ptr<vk::Queue> mQueue;
    std::shared_ptr<CommandPool> mCommandPool;
    std::shared_ptr<std::vector<vk::CommandBuffer>> mCommandBuffers = std::make_shared<std::vector<vk::CommandBuffer>>();
    std::shared_ptr<Fence> mFence;
    std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
    std::shared_ptr<std::vector<ObserverCommandBufferSubmitter*>> mObservers = std::make_shared<std::vector<ObserverCommandBufferSubmitter*>>();

CommandBufferSubmitter::CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers) :
    mCommandPool(std::make_shared<CommandPool>(device, true, true, device.getIndexTransferQueue())),
    mFence(std::make_shared<Fence>(device, false)) {
    *mCommandBuffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, numberCommandBuffers);

void CommandBufferSubmitter::addObserver(ObserverCommandBufferSubmitter *observer) {

vk::CommandBuffer CommandBufferSubmitter::createCommandBuffer() {
    if(*mIndex >= mCommandBuffers->size()) {
        auto buffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, 10);

        for(auto &b : buffers)

    return (*mCommandBuffers)[(*mIndex)++];

void CommandBufferSubmitter::submit() {
    vk::SubmitInfo info;
    mQueue->submit(info, *mFence);

void CommandBufferSubmitter::wait() {
    *mIndex = 0;
    for(auto &observer : *mObservers)

The code is not difficult, it allocates if needed one command buffer and return it and use fencing to know if works are completed.

Buffer transferer

You guessed that our Buffer transferer must implement the abstract class :

class BufferTransferer : public ObserverCommandBufferSubmitter
    BufferTransferer(Device &device, uint32_t numberBuffers, vk::DeviceSize sizeTransfererBuffers,
                     std::shared_ptr<AbstractAllocator> allocator, CommandBufferSubmitter &commandBufferSubmitter);

    void transfer(const Buffer &src, Buffer &dst,
                  vk::DeviceSize offsetSrc,
                  vk::DeviceSize offsetDst,
                  vk::DeviceSize size);

    void transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data);

    void notify();

    std::shared_ptr<CommandBufferSubmitter> mCommandBufferSubmitter;
    std::shared_ptr<std::vector<Buffer>> mTransfererBuffers = std::make_shared<std::vector<Buffer>>();
    std::shared_ptr<uint32_t> mSizeTransfererBuffers;
    std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);

The idea is to have several buffers ready to transfer data. Why this idea? Because users may don’t care about the CPU buffer and only want a GPU Buffer ! Thanks to that, if he wants to transfer data like glBufferSubData, he actually can !

The code to transfer a buffer is not complicated at all. However, you just have to be careful about the memory barrier. Personally, I use one from Transfer to ALL_Commands in this case.

void BufferTransferer::notify() {
    *mIndex = 0;

void BufferTransferer::transfer(Buffer const &src, Buffer &dst,
                                vk::DeviceSize offsetSrc, vk::DeviceSize offsetDst,
                                vk::DeviceSize size) {
    // Check if size and usage are legals
    assert((src.getUsage() & vk::BufferUsageFlagBits::eTransferSrc) ==
    assert((dst.getUsage() & vk::BufferUsageFlagBits::eTransferDst) ==

    assert(src.getSize() >= (offsetSrc + size));
    assert(dst.getSize() >= (offsetDst + size));

    // Prepare the region copied
    vk::BufferCopy region(offsetSrc, offsetDst, size);

    vk::CommandBufferBeginInfo begin(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);

    vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer();

    cmd.copyBuffer(src, dst, {region});
                                                VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
                                                dst, offsetSrc, size),


void BufferTransferer::transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data) {
    if(*mIndex == mTransfererBuffers->size()) {
    assert(size <= *mSizeTransfererBuffers);
    memcpy((*mTransfererBuffers)[*mIndex].getPtr(), data, size);
    transfer((*mTransfererBuffers)[*mIndex], buffer, 0, offset, size);

I do not manage the reallocation if the dst buffer is too small, or loop / recursion the transfer when our staging buffer is too small, but with our architecture, It would not be difficult to manage these cases !

How to use it??

Simply like that !

CommandBufferSubmitter commandBufferSubmitter(device, 1);
BufferTransferer bufferTransferer(device, 1, 1 << 20, deviceAllocator, commandBufferSubmitter);
glm::vec2 quad[] = {glm::vec2(-1, -1), glm::vec2(1, -1), glm::vec2(-1, 1), glm::vec2(1, 1)};
Buffer vbo(device, vk::BufferUsageFlagBits::eTransferDst | vk::BufferUsageFlagBits::eVertexBuffer, sizeof quad, deviceAllocator, true);
bufferTransferer.transfer(vbo, 0, sizeof quad, quad);

I saw that a high number of my visits comes from twitter. If you want to follow me: it is here.

Kisses and see you soon to see how to load / manage images !

calendar November 13, 2016 category Vulkan (, , , )

17 Responses

  1. Thanks for this cool lessons!
    And one more question:
    What means “=default” near constructor of Counter class?

  2. Hi ! I’m learning your lessons and appeared trouble:
    i’m trying compile your examples and compilator giving that error:
    C2280: ‘”std::unique_ptr<ShaderModule,std::default_delete> &std::unique_ptr<_Ty,std::default_delete>::operator =(const std::unique_ptr<_Ty,std::default_delete> &)”: attempting to reference a deleted function.

    I think that this’s due to the string: mShaders.emplace_back(std::make_unique(device, “../Shaders/shader_vert.spv”));
    If replace “unique_ptr” on “shared_ptr” it worked. But why?? Please answer me. What’s wrong?

    P.S. Visual Studio 2015

  3. Swear on this: “std::vector<std::unique_ptr> someVariable;”
    attempting to reference a deleted function.(C2280)
    with shared_ptr worked.

    • Okay I do not know why it did not work on Visual Studio 2015. (Maybe it does not succeed to call the move semantic assignment operator?).
      But, I have fixed all my samples.
      Why? Because, actually, Pipeline already has a shaderModule member, and there is no need to use pointer here.
      Thanks :).

  4. Thank you! These lessons also allow more and learn the so-called modern C ++ e.g. default constructors, shared_ptr etc.
    I’m waiting for new lessons)))

  5. One more question: in file Fence.hpp in constructor: “Fence(Device const &device, bool signaled);” “const” statement comes after “Device”. But in Fence.cpp vice versa: “const” statement comes before “Device”. All right? It is not so important?

    • I take a bad habbit to put the “const” after the type. But my IDE(QtCreator) prefers to put it before. So, when I use auto completion, I have the const after in the header, and before in the src file…
      I am beginning to clean it, but it will take some times I guess. But it is not important at all ;).

  6. Hello! I’m trying to run compiled your samples but an error (Only when compiled in Release mode) in the string:
    memcpy ((char *) (* mTransfererBuffers) [* mIndex] .getPtr () + (* mSizeAlreadyUsed) [* mIndex], data, size); (class Transferer)
    If I’m Compiled samples in Debug mode then all running easy (But in release mode error access violation when reading the address 0x00000000)
    Please help!

  7. I’m found, bug was in string: assert(!m_Chunks.back()->Allocate(size, alignment, block));
    In release mode assertion failed (result: assert(false)) but in debug mode assert(true), maybe this bind with compiler optimization. I’m removed assert and left just: m_Chunks.back()->Allocate(size, alignment, block)
    And all is ok
    Thank you for your trouble:)

  8. Hello! How i’m may add depth test in your examples? I’m adding vk::PipelineDepthStencilStateCreateInfo in Pipeline class then modifies vk::GraphicsPipelineCreateInfo ci. In RenderPass class adding depthAttachment, modifies vk::RenderPassCreateInfo. Change vk::ClearValue (add depth stencil). But finally application crashing.
    Please write tutorial about depth testing and add example ๐Ÿ™‚
    Thanks you!

    • Hello,
      I will write a tutorial about mipmap first.
      But maybe this week end, I will do one example using depth testing, but it is not sure right now :).

    • In my repository, there is a project Scene, it is not finished yet and it is not “error free”, but it works and draw Sponza

  9. Is it necessary to use shared_ptr for all your class variable?
    Like std::shared_ptr for ecample.
    shared_ptr has time overhead in constructor and destructor also in assignment operator.

    • Hello sorry for the late answer.
      Obviously you can use another kind of pointer like std::unique_ptr which has any overhead, or even better, use directly vk::Device values. It was just an example. In my new project I do not use shared_ptr for such a thing ๐Ÿ™‚

Leave a Reply

Scroll Top