Tag: Vulkan

  • Mipmap generation : Transfers, transition layout

    Hi guys !
    This article will deal with Mipmap’s generation.

    What is a Mipmap ?

    A mipmap is a kind of loop rescaling on a texture.
    For example, take one 512×128 texture, you will divide the size by 4 at each level :

    1. level 0 : 512×128
    2. level 1: 256×64
    3. level 2: 128×32
    4. level 3: 64×16
    5. level 4: 32×8
    6. level 5: 16×4
    7. level 6: 8×2
    8. level 7: 4×1
    9. level 8: 2×1
    10. level 9: 1×1

    Mipmap

    Why do we need Mipmap ?

    It can be seen as a Level of Detail (LoD). If the object is far from the camera, you do not need to use all details but only some. So, instead to send to the GPU all the texture, you can send a mipmap with a different level that is far away lighter than the original (level 0).

    How to generate Mipmap in Vulkan ?

    Contrary to OpenGL which provide glGenerateTextureMipmap function, Vulkan does not provide function to build mipmap by itself. You have to deal with it by yourself. There are two ways.

    1. Using the shaders and framebuffers. You use the shader to draw into the framebuffer which is half the size of the texture, and half of the half…
    2. Using the transfer queue and vkCmdBlitImage which blit one image into another.

    We are going to see the second way.
    To do it, we are going to use the Transferer class we saw prior.

    First, the number of mipmaps level for one image is :
    levels=floor(log_2(max(width, height))) + 1

    The idea of the algorithm to create the differents mipmap levels is easy.

    1. You initialize the level 0 (from a file for example) and put the layout to TRANSFER_SRC
    2. You set the level 1 to TRANSFER_DST
    3. You blit the level 0 to the level 1
    4. You set the level 1 to TRANSFER_SRC
    5. You reitere 2 3 4 for each level.
    6. You transition all levels to the layout you need.

    So, here is our code :

    Beginning with the CommandBuffer

    void Transferer::buildMipMap(Image &src) {
        vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer(nullptr);
        vk::CommandBufferBeginInfo beginInfo(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    
        cmd.begin(beginInfo);

    Prepare the blit !

        for(uint32_t i = 1; i < src.getMipLevels(); ++i) {
            vk::ImageBlit blit;
            blit.srcSubresource.aspectMask = vk::ImageAspectFlagBits::eColor;
            blit.srcSubresource.baseArrayLayer = 0;
            blit.srcSubresource.layerCount = 1;
            blit.srcSubresource.mipLevel = i - 1;
            blit.dstSubresource.aspectMask = vk::ImageAspectFlagBits::eColor;
            blit.dstSubresource.baseArrayLayer = 0;
            blit.dstSubresource.layerCount = 1;
            blit.dstSubresource.mipLevel = i;
    
            // each mipmap is the size divided by two
            blit.srcOffsets[1] = vk::Offset3D(std::max(1u, src.getSize().width >> (i - 1)),
                                              std::max(1u, src.getSize().height >> (i - 1)),
                                              1);
    
            blit.dstOffsets[1] = vk::Offset3D(std::max(1u, src.getSize().width >> i),
                                              std::max(1u, src.getSize().height >> i),
                                              1);
    

    The loop begins from 1 because the level 0 is already initialized.
    After, you explain to Vulkan which level you will use as the source, and which one you will use as the destination.
    Do not forget to use max when you compute the offset because if you do not use it, you will be unable to build mipmap for the last levels if your image is not with a 1:1 ratio.

    Transition and Blit

    After, you have to transition your mipmap level image layout you want to draw into to TRANSFER_DST

     vk::ImageSubresourceRange range(vk::ImageAspectFlagBits::eColor, i, 1, 0, 1);
     // transferDst go to transferSrc because this mipmap will be the source for the next iteration (the next level)
     vk::ImageMemoryBarrier preBlit = transitionImage(src, vk::ImageLayout::eUndefined, vk::ImageLayout::eTransferDstOptimal, range);
     cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTopOfPipe,
                         vk::PipelineStageFlagBits::eTransfer,
                         vk::DependencyFlags(),
                         nullptr, nullptr,
                         preBlit);
    
    cmd.blitImage(src, vk::ImageLayout::eTransferSrcOptimal,
                  src, vk::ImageLayout::eTransferDstOptimal, blit,
                  vk::Filter::eLinear);
    

    And you just use blitImage to blit it.

    After, you have to transition the mipmap level image layout to TRANSFER_SRC

     vk::ImageMemoryBarrier postBlit = transitionImage(src, vk::ImageLayout::eTransferDstOptimal, vk::ImageLayout::eTransferSrcOptimal, range);
     cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                         vk::PipelineStageFlagBits::eTransfer,
                         vk::DependencyFlags(),
                         nullptr, nullptr,
                         postBlit);

    Finish

    You have to transition all mipmap levels to the layout you want to use

    vk::ImageSubresourceRange range(vk::ImageAspectFlagBits::eColor, 0, VK_REMAINING_MIP_LEVELS, 0, 1);
    
    // transition all mipmap levels to shaderReadOnlyOptimal
    vk::ImageMemoryBarrier transition = transitionImage(src,
                                                        vk::ImageLayout::eTransferSrcOptimal,
                                                        vk::ImageLayout::eShaderReadOnlyOptimal,
                                                        range);
    
    cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                        vk::PipelineStageFlagBits::eAllCommands,
                        vk::DependencyFlags(),
                        nullptr, nullptr, transition);

    Conclusion

    This article was short, but mipmap are not that difficult to handle. Do you like this kind of short article?
    Maybe the next article will be about descriptor set management.

    Reference

    Sascha Willems Mipmap

  • Buffer management with Vulkan : transfer, Staging buffer

    Hi guys !
    I keep my promise and I am coming with explanations and implementation on how to use and manage one (or several) buffer in Vulkan application.

    How I manage my resources ?

    shared_ptr?

    Firstly, I have a way to manage Vulkan resource that is a bit weird. The idea is to “emulate” the behaviour of shared_ptr and enable copy / move.
    So, if you do that :

    Image b1(); // count = 1
    Image b2 = b1; // count = 2

    b1 and b2 are exactly the same Vulkan’s Image.

    A counter?

    To emulate the behaviour of a shared_ptr, I created one class that is simply a Counter.

    class Counter
    {
    public:
        Counter() = default;
        Counter(Counter const &counter);
        Counter(Counter &&counter) = default;
        Counter &operator=(Counter counter);
    
        uint32_t getCount() const;
    
        virtual ~Counter();
    protected:
        std::shared_ptr<uint32_t> mCount = std::make_shared<uint32_t>(1);
    };
    
    Counter::Counter(const Counter &counter) :
        mCount(counter.mCount) {
        ++(*mCount);
    }
    
    Counter &Counter::operator =(Counter counter) {
        using std::swap;
        swap(mCount, counter.mCount);
        return *this;
    }
    
    uint32_t Counter::getCount() const {
        return *mCount;
    }
    
    Counter::~Counter() {
    
    }
    

    A Vulkan Resource

    A Vulkan resource lives through a device. So I wrote this little class that represents a Vulkan Resource :

    class VkResource : public Counter
    {
    public:
        VkResource() = default;
        VkResource(Device const &device);
        VkResource(VkResource &&vkResource) = default;
        VkResource(VkResource const &vkResource) = default;
        VkResource &operator=(VkResource &&vkResource) = default;
        VkResource &operator=(VkResource const &vkResource) = default;
    
        vk::Device getDevice() const;
    
    protected:
        std::shared_ptr<Device> mDevice;
    };
    
    
    VkResource::VkResource(const Device &device) :
        mDevice(std::make_shared<Device>(device)) {
    
    }
    
    vk::Device VkResource::getDevice() const {
        return *mDevice;
    }
    

    Buffer in Vulkan

    Unlike OpenGL, buffers in Vulkan are separated from memory. You must bind the memory to them. Since you can choose if you want the memory on the device_local heap or in the host_visible, you can chose which heap your buffer will use.

    So what are buffer made with ?

    Buffers are made with a size, a usage (vertex? uniform ?), one block of memory, one ptr if the buffer is HOST_VISIBLE etc.
    My buffer class is :

    class Buffer : public VkResource, public vk::Buffer
    {
    public:
        Buffer() = default;
    
        Buffer(Device &device, vk::BufferUsageFlags usage, vk::DeviceSize size,
               std::shared_ptr<AbstractAllocator> allocator, bool shouldBeDeviceLocal);
    
        Buffer(Buffer &&buffer) = default;
        Buffer(Buffer const &buffer) = default;
        Buffer &operator=(Buffer const &buffer);
    
        vk::DeviceSize getSize() const;
        vk::BufferUsageFlags getUsage() const;
        bool isDeviceLocal() const;
        void *getPtr();
        std::shared_ptr<AbstractAllocator> getAllocator();
    
        ~Buffer();
    
    private:
        std::shared_ptr<AbstractAllocator> mAllocator;
        std::shared_ptr<vk::DeviceSize> mSize = std::make_shared<vk::DeviceSize>();
        std::shared_ptr<vk::BufferUsageFlags> mUsage = std::make_shared<vk::BufferUsageFlags>();
        std::shared_ptr<vk::MemoryRequirements> mRequirements = std::make_shared<vk::MemoryRequirements>();
        std::shared_ptr<vk::PhysicalDeviceMemoryProperties> mProperties = std::make_shared<vk::PhysicalDeviceMemoryProperties>();
        std::shared_ptr<Block> mBlock = std::make_shared<Block>();
        std::shared_ptr<bool> mIsDeviceLocal;
        std::shared_ptr<void *> mPtr = std::make_shared<void *>(nullptr);
    
        void createBuffer();
        void allocate(bool shouldBeDeviceLocal);
    };
    

    It may be is a bit complicate, but it is not really that difficult. A buffer will be created with an usage, one size and one boolean to put or not this buffer in device_local memory.
    The creation of the buffer is quite simple. You just have to give the size and the usage :

    vk::BufferCreateInfo createInfo(vk::BufferCreateFlags(),
                                    *mSize,
                                    *mUsage,
                                     vk::SharingMode::eExclusive);
    
    m_buffer = mDevice->createBuffer(createInfo);
    *mRequirements = mDevice->getBufferMemoryRequirements(m_buffer);

    The last line is to get the memory requirements. It will give you the real size you need (padding or other things) and list of memory types that can be used with the buffer.
    To get the memory type index, I developed this function which cares about device local memory or host visible memory :

    int findMemoryType(uint32_t memoryTypeBits,
                       vk::PhysicalDeviceMemoryProperties const &properties,
                       bool shouldBeDeviceLocal) {
    
        auto lambdaGetMemoryType = [&](vk::MemoryPropertyFlags propertyFlags) -> int {
            for(uint32_t i = 0; i < properties.memoryTypeCount; ++i)
                if((memoryTypeBits & (1 << i)) &&
                ((properties.memoryTypes[i].propertyFlags & propertyFlags) == propertyFlags))
                    return i;
            return -1;
        };
    
        if(!shouldBeDeviceLocal) {
            vk::MemoryPropertyFlags optimal = vk::MemoryPropertyFlagBits::eHostCached |
                    vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            vk::MemoryPropertyFlags required = vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            int type = lambdaGetMemoryType(optimal);
            if(type == -1) {
                int result = lambdaGetMemoryType(required);
                if(result == -1)
                    assert(!"Memory type does not find");
                return result;
            }
            return type;
        }
    
        else
            return lambdaGetMemoryType(vk::MemoryPropertyFlagBits::eDeviceLocal);
    }

    This code was made with the specifications themselves.
    Now we should allocate memory for our buffers :

     int memoryTypeIndex = findMemoryType(mRequirements->memoryTypeBits, *mProperties, shouldBeDeviceLocal);
    
        *mBlock = mAllocator->allocate(mRequirements->size, mRequirements->alignment, memoryTypeIndex);
        mDevice->bindBufferMemory(m_buffer, mBlock->memory, mBlock->offset);
    
        // if host_visible, we can map it
        if(!shouldBeDeviceLocal)
            *mPtr = mDevice->mapMemory(mBlock->memory, mBlock->offset,
                                      *mSize, vk::MemoryMapFlags());

    As you can see, you allocate the memory, and you bind the memory. If the memory is host visible, you can map it.

    Now we have a class to manage our buffers. But it is not finished at all !

    Staging resources

    We cannot write directly to the device_local memory. We must use something that we call a staging resource. Staging resources can be buffers or images. The idea is to bind a host visible memory to a staging resource, and transfer the memory through the staging resource to a resource with memory that resides in device_local memory.

    staging buffer

    Command Buffers submitting

    Before to transfer anything, I wanted to have a class that manages the submitting of command buffers. When the work is done, the command submitter should notify transferer object that use it. I used an observer pattern :

    class ObserverCommandBufferSubmitter {
    public:
        virtual void notify() = 0;
    };
    
    class CommandBufferSubmitter
    {
    public:
        CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers);
    
        void addObserver(ObserverCommandBufferSubmitter *observer);
    
        vk::CommandBuffer createCommandBuffer();
    
        void submit();
        void wait();
    
    protected:
        std::shared_ptr<Device> mDevice;
        std::shared_ptr<vk::Queue> mQueue;
        std::shared_ptr<CommandPool> mCommandPool;
        std::shared_ptr<std::vector<vk::CommandBuffer>> mCommandBuffers = std::make_shared<std::vector<vk::CommandBuffer>>();
        std::shared_ptr<Fence> mFence;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
        std::shared_ptr<std::vector<ObserverCommandBufferSubmitter*>> mObservers = std::make_shared<std::vector<ObserverCommandBufferSubmitter*>>();
    };
    
    CommandBufferSubmitter::CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers) :
        mDevice(std::make_shared<Device>(device)),
        mQueue(std::make_shared<vk::Queue>(device.getTransferQueue())),
        mCommandPool(std::make_shared<CommandPool>(device, true, true, device.getIndexTransferQueue())),
        mFence(std::make_shared<Fence>(device, false)) {
        *mCommandBuffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, numberCommandBuffers);
    }
    
    void CommandBufferSubmitter::addObserver(ObserverCommandBufferSubmitter *observer) {
        mObservers->emplace_back(observer);
    }
    
    vk::CommandBuffer CommandBufferSubmitter::createCommandBuffer() {
        if(*mIndex >= mCommandBuffers->size()) {
            auto buffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, 10);
    
            for(auto &b : buffers)
                mCommandBuffers->emplace_back(b);
        }
    
        return (*mCommandBuffers)[(*mIndex)++];
    }
    
    void CommandBufferSubmitter::submit() {
        vk::SubmitInfo info;
        info.setCommandBufferCount(*mIndex).setPCommandBuffers(mCommandBuffers->data());
        mFence->reset();
        mQueue->submit(info, *mFence);
    }
    
    void CommandBufferSubmitter::wait() {
        *mIndex = 0;
        mFence->wait();
        mFence->reset();
        for(auto &observer : *mObservers)
            observer->notify();
    }
    

    The code is not difficult, it allocates if needed one command buffer and return it and use fencing to know if works are completed.

    Buffer transferer

    You guessed that our Buffer transferer must implement the abstract class :

    class BufferTransferer : public ObserverCommandBufferSubmitter
    {
    public:
        BufferTransferer(Device &device, uint32_t numberBuffers, vk::DeviceSize sizeTransfererBuffers,
                         std::shared_ptr<AbstractAllocator> allocator, CommandBufferSubmitter &commandBufferSubmitter);
    
        void transfer(const Buffer &src, Buffer &dst,
                      vk::DeviceSize offsetSrc,
                      vk::DeviceSize offsetDst,
                      vk::DeviceSize size);
    
        void transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data);
    
        void notify();
    
    private:
        std::shared_ptr<CommandBufferSubmitter> mCommandBufferSubmitter;
        std::shared_ptr<std::vector<Buffer>> mTransfererBuffers = std::make_shared<std::vector<Buffer>>();
        std::shared_ptr<uint32_t> mSizeTransfererBuffers;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
    };

    The idea is to have several buffers ready to transfer data. Why this idea? Because users may don’t care about the CPU buffer and only want a GPU Buffer ! Thanks to that, if he wants to transfer data like glBufferSubData, he actually can !

    The code to transfer a buffer is not complicated at all. However, you just have to be careful about the memory barrier. Personally, I use one from Transfer to ALL_Commands in this case.

    void BufferTransferer::notify() {
        *mIndex = 0;
    }
    
    void BufferTransferer::transfer(Buffer const &src, Buffer &dst,
                                    vk::DeviceSize offsetSrc, vk::DeviceSize offsetDst,
                                    vk::DeviceSize size) {
        // Check if size and usage are legals
        assert((src.getUsage() & vk::BufferUsageFlagBits::eTransferSrc) ==
                    vk::BufferUsageFlagBits::eTransferSrc);
        assert((dst.getUsage() & vk::BufferUsageFlagBits::eTransferDst) ==
                    vk::BufferUsageFlagBits::eTransferDst);
    
        assert(src.getSize() >= (offsetSrc + size));
        assert(dst.getSize() >= (offsetDst + size));
    
        // Prepare the region copied
        vk::BufferCopy region(offsetSrc, offsetDst, size);
    
        vk::CommandBufferBeginInfo begin(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    
        vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer();
    
        cmd.begin(begin);
        cmd.copyBuffer(src, dst, {region});
        cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                            vk::PipelineStageFlagBits::eAllCommands,
                            vk::DependencyFlags(),
                            nullptr,
                            vk::BufferMemoryBarrier(vk::AccessFlagBits::eTransferWrite,
                                                    vk::AccessFlagBits::eMemoryRead,
                                                    VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
                                                    dst, offsetSrc, size),
                            nullptr);
        cmd.end();
    
    }
    
    void BufferTransferer::transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data) {
        if(*mIndex == mTransfererBuffers->size()) {
            mCommandBufferSubmitter->submit();
            mCommandBufferSubmitter->wait();
        }
        assert(size <= *mSizeTransfererBuffers);
        memcpy((*mTransfererBuffers)[*mIndex].getPtr(), data, size);
        transfer((*mTransfererBuffers)[*mIndex], buffer, 0, offset, size);
        (*mIndex)++;
    }
    

    I do not manage the reallocation if the dst buffer is too small, or loop / recursion the transfer when our staging buffer is too small, but with our architecture, It would not be difficult to manage these cases !

    How to use it??

    Simply like that !

    CommandBufferSubmitter commandBufferSubmitter(device, 1);
    BufferTransferer bufferTransferer(device, 1, 1 << 20, deviceAllocator, commandBufferSubmitter);
    glm::vec2 quad[] = {glm::vec2(-1, -1), glm::vec2(1, -1), glm::vec2(-1, 1), glm::vec2(1, 1)};
    Buffer vbo(device, vk::BufferUsageFlagBits::eTransferDst | vk::BufferUsageFlagBits::eVertexBuffer, sizeof quad, deviceAllocator, true);
    bufferTransferer.transfer(vbo, 0, sizeof quad, quad);
    commandBufferSubmitter.submit();
    commandBufferSubmitter.wait();

    I saw that a high number of my visits comes from twitter. If you want to follow me: it is here.

    Kisses and see you soon to see how to load / manage images !

  • Barriers in Vulkan : They are not that difficult

    Hi !
    Yes, I know, I lied, I said that my next article will be about buffers or images, but, finally, I’d prefer to talk about barriers first. However, barriers are, IMHO, a really difficult thing to well understand, so, this article might countain some mistakes.
    In that case, please, let me know it by mail, or by one comment.
    By the way, this article could remind you in some parts the article on GPU Open : Performance Tweets series: Barriers, fences, synchronization and Vulkan barriers explained

    What memory barriers are for?

    Memory barriers are source of bugs.
    More seriously, barriers are used for three (actually four) things.

    1. Execution Barrier (synchronization) : To ensure that prior commands has finished
    2. Memory Barrier (memory visibility / availability): To ensure that prior writes are visible
    3. Layout Transitioning (useful for image) : To Optimize the usage of the resource
    4. Reformatting

    I am not going to talk about reformating because (it is a shame) I am not very confident with it.

    What exactly is an execution barrier ?

    An execution barrier could remind you mutex on CPU thread. You write something in one resource. When you want to read what you write in, you must wait the write is finished.

    What exactly is a memory barrier ?

    When you write something from one thread, it could write it on some caches and you must flush them to ensure the visibility where you want to read that data. That is what memory barriers are for.
    They ensure as well layout transition for image to get the best performance your graphic card can.

    How it is done in Vulkan

    Now that we understand why barriers are so important, we are going to see how can we use them in Vulkan.

    Vulkan’s Pipeline

    Vulkan Pipeline

    To be simple, the command enters in the top_of_pipe stage and end at bottom_of_pipe stage.
    It exists an extra stage that refers to the host.

    Barriers between stages

    We are going to see two examples (that are inspired from GPU Open).
    We will begin with the worse case : your first command writes at each stage everywhere it is possible, your second command reads at each stage everywhere it is possible.
    It simply means that you want to wait for the first command totally finish before the second one begin.

    To be simple, with a scheme it means that :
    barriers-all_to_all

    • In gray : All the stages that need to be executed before or after the barrier (or the ones that are never reached)
    • In red : Above the barrier, it means where the data are produced. Below the barrier, it means where the data are consumed.
    • In green : They are unblocked stages. You should try to have the maximum green stages as possible.

    As you can see, here, you don’t have any green stages, so it is not good at all for performances.

    In Vulkan C++, you should have something like that:

    cmd.pipelineBarrier(
    vk::PipelineStageFlagBits::eAllCommands, 
    vk::PipelineStageFlagBits::eAllCommands, ...);

    Some people use BOTTOM_OF_PIPE as source and TOP_OF_PIPE as the destination. It is not false, but it is useful only for execution barrier. These stages do not access memory, so they can’t make memory access visible or even available!!!! You should not (must not?) issue a memory barrier on these stages, but we are going to see that later.

    Now, we are going to see a better case
    Imagine your first command fills an image or one buffer (SSBO or imageStore) through the VERTEX_SHADER. Now imagine you want to use these data in EVALUATION_SHADER.
    The prior scheme, after modification, is :
    barriers in the good way

    As you can see, there is a lot of green stages and it is very good!
    The Vulkan C++ code should be:

    cmd.pipelineBarrier(
    vk::PipelineStageFlagBits::eVertexShader,
    vk::PipelineStageFlagBits::eTessellationEvaluationShader,...);

    By Region or not?

    This part may contain errors, so please, let me know if you disagree with me
    To begin, what does by region means?
    A region is a little part of your framebuffer. If you specify to use by region dependency, it means that (in fragment buffer space) operations need to be finished only in the region (that is specific to the implementation) and not in the whole image.
    Well, it is not clear what is a fragment buffer space. In my opinion, and after reading the documentation, it could be from the EARLY_TEST (or at least FRAGMENT_SHADER if early depth is not enabled) to the COLOR_ATTACHMENT.

    Actually, to me this flag lets the driver to optimize a bit. However, it must be used only (and should not be useful elsewhere IMHO) between subpasses for subpasses input attachments).
    But I may be wrong !

    Everything above about is wrong, if you want a plain explanation, see the comment from devsh. To make it simple, it means that the barrier will operate only on “one pixel” of the image. It could be used for input attachment or pre depth pass for example

    Memory Barriers

    Okay, now that we have seen how make a pure execution barrier (that means without memory barriers).
    Memory barriers ensure the availability for the first half memory dependency and the visibility for the second one. We can see them as a “flushing” and “invalidation”. Make information available does not mean that it is visible.
    In each kind of memory barrier you will have a srcAccessMask and a dstAccessMask.
    How do they work?

    Access and stage are somewhat coupled. For each stage of srcStage, all memory accesses using the set of access types defined in srcAccessMask will be made available. It can be seen as a flush of caches defined by srcAccessMask in all stages.

    For dstStage / dstAccess, it is the same thing, but instead to make information available, the information is made visible for these stages and these accesses.

    That’s why using BOTTOM/TOP_OF_PIPELINE is meaningless for memory barrier.

    For buffer and image barriers, you could as well perform a “releasing of ownership” from a queue to another of the resource you are using.
    An example, you transfer the image in your queue that is only used for transfers. At the end, you must perform a releasing from the transfer queue to the compute (or graphic) queue.

    Global Memory Barriers

    These kind of memory barriers applies to all memory objects that exist at the time of its execution.
    I do not have any example of when to use this kind of memory barrier. Maybe if you have a lot of barriers to do, it is better to use global memory barriers.
    An example:

    vk::MemoryBarrier(
    vk::AccessFlagBits::eMemoryWrite,
    vk::AccessFlagBits::eMemoryRead);

    Buffer Memory Barriers

    Here, accessesMask are valid only for the buffer we are working on through the barrier.
    Here is the example :

    vk::BufferMemoryBarrier(
    vk::AccessFlagBits::eTransferWrite,
    vk::AccessFlagBits::eShaderRead,
    transferFamillyIndex,
    queueFamillyIndex,
    0, VK_WHOLE_SIZE);

    Image Memory Barriers

    Image memory barriers have another kind of utility. They can perform layout transitions.

    Example:
    I want to create mipmaps associated to one image (we will see the complete function in another article) through vkCmdBlitImage.
    After a vkCmdBlitImage, I want use the mipmap I just wrote as a source for the next mipmap level.

    oldLayout must be DST_TRANSFER and newLayout must be SRC_TRANSFER.
    Which kind of access I made and which kind of access I will do?
    That is easy, I performed a TRANSFER_WRITE and I want to perform a TRANSFER_READ.
    At each stage my last command “finish” and at each stage my new command “begin”? Both in TRANSFER_STAGE.

    In C++ it is done by something like that:

    cmd.blitImage();
    vk::ImageMemoryBarrier imageBarrier(
    vk::AccessFlagBits::eTransferWrite,
    vk::AccessFlagBits::eTransferRead,
    vk::ImageLayout::eTransferDstOptimal,
    vk::ImageLayout::eTransferSrcOptimal,
    0, 0, image, subResourceRange);
    
    cmd.pipelineBarrier(
    vk::PipelineStageFlagBits::eTransfer,
    vk::PipelineStageFlagBits::eTransfer,
    vk::DependencyFlags(),
    nullptr, nullptr, imageBarrier);

    I hope that you enjoyed that article and that you have learned some things. Synchronization through Vulkan is not as easy to handle and all I wrote may (surely?) contains some errors.

    Reference:

    Memory barriers on TOP_OF_PIPE #128
    Specs