Hi !
Yes, I know, I lied, I said that my next article will be about buffers or images, but, finally, I’d prefer to talk about barriers first. However, barriers are, IMHO, a really difficult thing to well understand, so, this article might countain some mistakes.
In that case, please, let me know it by mail, or by one comment.
By the way, this article could remind you in some parts the article on GPU Open : Performance Tweets series: Barriers, fences, synchronization and Vulkan barriers explained

What memory barriers are for?

Memory barriers are source of bugs.
More seriously, barriers are used for three (actually four) things.

Execution Barrier (synchronization) : To ensure that prior commands has finished
Memory Barrier (memory visibility / availability): To ensure that prior writes are visible
Layout Transitioning (useful for image) : To Optimize the usage of the resource
Reformatting

I am not going to talk about reformating because (it is a shame) I am not very confident with it.

What exactly is an execution barrier ?

An execution barrier could remind you mutex on CPU thread. You write something in one resource. When you want to read what you write in, you must wait the write is finished.

What exactly is a memory barrier ?

When you write something from one thread, it could write it on some caches and you must flush them to ensure the visibility where you want to read that data. That is what memory barriers are for.
They ensure as well layout transition for image to get the best performance your graphic card can.

How it is done in Vulkan

Now that we understand why barriers are so important, we are going to see how can we use them in Vulkan.

Vulkan’s Pipeline

Vulkan Pipeline

To be simple, the command enters in the top_of_pipe stage and end at bottom_of_pipe stage.
It exists an extra stage that refers to the host.

Barriers between stages

We are going to see two examples (that are inspired from GPU Open).
We will begin with the worse case : your first command writes at each stage everywhere it is possible, your second command reads at each stage everywhere it is possible.
It simply means that you want to wait for the first command totally finish before the second one begin.

To be simple, with a scheme it means that :
barriers-all_to_all

In gray : All the stages that need to be executed before or after the barrier (or the ones that are never reached)
In red : Above the barrier, it means where the data are produced. Below the barrier, it means where the data are consumed.
In green : They are unblocked stages. You should try to have the maximum green stages as possible.

As you can see, here, you don’t have any green stages, so it is not good at all for performances.

In Vulkan C++, you should have something like that:

cmd.pipelineBarrier(
vk::PipelineStageFlagBits::eAllCommands, 
vk::PipelineStageFlagBits::eAllCommands, ...);

Some people use BOTTOM_OF_PIPE as source and TOP_OF_PIPE as the destination. It is not false, but it is useful only for execution barrier. These stages do not access memory, so they can’t make memory access visible or even available!!!! You should not (must not?) issue a memory barrier on these stages, but we are going to see that later.

Now, we are going to see a better case
Imagine your first command fills an image or one buffer (SSBO or imageStore) through the VERTEX_SHADER. Now imagine you want to use these data in EVALUATION_SHADER.
The prior scheme, after modification, is :
barriers in the good way

As you can see, there is a lot of green stages and it is very good!
The Vulkan C++ code should be:

cmd.pipelineBarrier(
vk::PipelineStageFlagBits::eVertexShader,
vk::PipelineStageFlagBits::eTessellationEvaluationShader,...);

By Region or not?

This part may contain errors, so please, let me know if you disagree with me
To begin, what does by region means?
A region is a little part of your framebuffer. If you specify to use by region dependency, it means that (in fragment buffer space) operations need to be finished only in the region (that is specific to the implementation) and not in the whole image.
Well, it is not clear what is a fragment buffer space. In my opinion, and after reading the documentation, it could be from the EARLY_TEST (or at least FRAGMENT_SHADER if early depth is not enabled) to the COLOR_ATTACHMENT.

Actually, to me this flag lets the driver to optimize a bit. However, it must be used only (and should not be useful elsewhere IMHO) between subpasses for subpasses input attachments).
But I may be wrong !

Everything above about is wrong, if you want a plain explanation, see the comment from devsh. To make it simple, it means that the barrier will operate only on “one pixel” of the image. It could be used for input attachment or pre depth pass for example

Memory Barriers

Okay, now that we have seen how make a pure execution barrier (that means without memory barriers).
Memory barriers ensure the availability for the first half memory dependency and the visibility for the second one. We can see them as a “flushing” and “invalidation”. Make information available does not mean that it is visible.
In each kind of memory barrier you will have a srcAccessMask and a dstAccessMask.
How do they work?

Access and stage are somewhat coupled. For each stage of srcStage, all memory accesses using the set of access types defined in srcAccessMask will be made available. It can be seen as a flush of caches defined by srcAccessMask in all stages.

For dstStage / dstAccess, it is the same thing, but instead to make information available, the information is made visible for these stages and these accesses.

That’s why using BOTTOM/TOP_OF_PIPELINE is meaningless for memory barrier.

For buffer and image barriers, you could as well perform a “releasing of ownership” from a queue to another of the resource you are using.
An example, you transfer the image in your queue that is only used for transfers. At the end, you must perform a releasing from the transfer queue to the compute (or graphic) queue.

Global Memory Barriers

These kind of memory barriers applies to all memory objects that exist at the time of its execution.
I do not have any example of when to use this kind of memory barrier. Maybe if you have a lot of barriers to do, it is better to use global memory barriers.
An example:

vk::MemoryBarrier(
vk::AccessFlagBits::eMemoryWrite,
vk::AccessFlagBits::eMemoryRead);

Buffer Memory Barriers

Here, accessesMask are valid only for the buffer we are working on through the barrier.
Here is the example :

vk::BufferMemoryBarrier(
vk::AccessFlagBits::eTransferWrite,
vk::AccessFlagBits::eShaderRead,
transferFamillyIndex,
queueFamillyIndex,
0, VK_WHOLE_SIZE);

Image Memory Barriers

Image memory barriers have another kind of utility. They can perform layout transitions.

Example:
I want to create mipmaps associated to one image (we will see the complete function in another article) through vkCmdBlitImage.
After a vkCmdBlitImage, I want use the mipmap I just wrote as a source for the next mipmap level.

oldLayout must be DST_TRANSFER and newLayout must be SRC_TRANSFER.
Which kind of access I made and which kind of access I will do?
That is easy, I performed a TRANSFER_WRITE and I want to perform a TRANSFER_READ.
At each stage my last command “finish” and at each stage my new command “begin”? Both in TRANSFER_STAGE.

In C++ it is done by something like that:

cmd.blitImage();
vk::ImageMemoryBarrier imageBarrier(
vk::AccessFlagBits::eTransferWrite,
vk::AccessFlagBits::eTransferRead,
vk::ImageLayout::eTransferDstOptimal,
vk::ImageLayout::eTransferSrcOptimal,
0, 0, image, subResourceRange);

cmd.pipelineBarrier(
vk::PipelineStageFlagBits::eTransfer,
vk::PipelineStageFlagBits::eTransfer,
vk::DependencyFlags(),
nullptr, nullptr, imageBarrier);

I hope that you enjoyed that article and that you have learned some things. Synchronization through Vulkan is not as easy to handle and all I wrote may (surely?) contains some errors.

Reference:

Memory barriers on TOP_OF_PIPE #128
Specs

Comments

6 responses to “Barriers in Vulkan : They are not that difficult”

March 1, 2017

Qining Lu

Thank you very much for the explanation and figures. They are very helpful and the best resources I can find online to explain the thing.

I’m new to graphics and I have a maybe too naive question.

Take the example in your post. We have a graphics pipeline and a vkCmdDraw, when got executed, will go through Vertex shader stage and Evaluation stage in order.

If I need to use the data produced in the Vertex shader stage in the following Evaluation stage (note they are in one vkCmdDraw, so in one same pipeline walk). Do I need to add barrier? My understanding is that I won’t need that, barrier is only meaningful across commands. As these two stages are in the same stage, it is not necessary to specify any execution dependency between them.

Is my understanding correct? It might be a too naive question so that I cannot find any useful info online.

Loading…

Reply
1. March 1, 2017
  
  Antoine MORRIER
  
  Hello. Thanks, it is nice to see things I wrote is interresting.
  If it is into the SAME draw, there is no way to do a “vulkan barrier” . So, in Vulkan, there is no way to put a barrier inside the same draw. However, in GLSL, there is !
  
  Indeed, you have this group of functions (memoryBarrier, memoryBarrierBuffer, memoryBarrier*…).
  So in this case, you must use this barrier inside the vertexShader.
  
  for example in pseudocode :
  VS : ssbo buffer;
  buffer.value = oneRandomValue; memoryBarrierBuffer(); // since we write into a buffer TES: ssbo buffer;
  function(buffer.value); // I am using the value I wrote inside the vertexShader.
  
  However, I think it is dangerous to do something like that. I do not really know if several vertex will be already write before the TES.
  However, in the documentation, it is write :
  ”
  
  In particular, the values written this way in one shader stage are guaranteed to be visible to coherent memory accesses performed by shader invocations in subsequent stages when those invocations were triggered by the execution of the original shader invocation (e.g., fragment shader invocations for a primitive resulting from a particular geometry shader invocation).
  
  So, to me your understanding is almost correct, since you do need a barrier, but in the shader not a Vulkan one.
  
  Hope it helps :).
  
  Loading…
  
  Reply
  1. March 4, 2017
    
    Qining Lu
    
    It helps a lot, thank you very much 🙂
    
    Loading…
    
    Reply
November 24, 2018

devsh

Nice article, but there is a big error on the BY_REGION

What it means is that the memory dependency will be created only w.r.t. a region of a the framebuffer.
As of late 2018, the spec explicitly mentions that the region is a single pixel (or sample, I’m not entirely sure) despite tiled renderers using much larger tile sizes.

So because of that definition that includes a framebuffer and a pixel, it only makes sense when we talk about fragment shaders.

Because the whole by-region semantic only really means that you can read the pixel previously written to the exact same location as your pixel shader invocation is at (gl_FragCoord), naturally with no interpolation!

This is basically only (but very) useful for things like programmable blending (blending Decals into a Normal compressed GBuffer), order independent transparency, per-pixel linked lists.

Loading…

Reply
1. December 12, 2018
  
  Antoine MORRIER
  
  Yes, you are totally right. I understood that few months later after writing this article :).
  hanks for your comment 🙂
  
  Loading…
  
  Reply
January 31, 2026

The renderer of my game engine. (The renderwindow creation) | laurentdur

[…] Here we bind barriers and clear the swapchain images, now, the question is what barrier are ? Barrier simply ensure that the GPU memory is flushed before using it. The first parameter (the source stage) determine from where the GPU have to wait until the second parameter (the destination stage) have finished to flush the memory. And the srcAccessMask and dstAccessMask determines what are the access to the memory at the source and destination pipeline stages. By example for presentToClearBarrier it tells the GPU to wait that the writes to the memory transfert are flushed before having access to the memory read. ClearToPresentBarrier does the inverse, it tells the GPU to wait that every memory read access has been flushed before writing on it. Everything is explained here more on details : https://cpp-rendering.io/barriers-vulkan-not-difficult/ […]

Loading…

Reply

Barriers in Vulkan : They are not that difficult

What memory barriers are for?

What exactly is an execution barrier ?

What exactly is a memory barrier ?

How it is done in Vulkan

Vulkan’s Pipeline

Barriers between stages

By Region or not?

Memory Barriers

Global Memory Barriers

Buffer Memory Barriers

Image Memory Barriers

Reference:

Like this:

Comments

6 responses to “Barriers in Vulkan : They are not that difficult”

Leave a ReplyCancel reply

Barriers in Vulkan : They are not that difficult

What memory barriers are for?

What exactly is an execution barrier ?

What exactly is a memory barrier ?

How it is done in Vulkan

Vulkan’s Pipeline

Barriers between stages

By Region or not?

Memory Barriers

Global Memory Barriers

Buffer Memory Barriers

Image Memory Barriers

Reference:

Share this:

Like this:

Comments

6 responses to “Barriers in Vulkan : They are not that difficult”

Leave a ReplyCancel reply