Tag Archives: C++

Range : Be expressive using smart iterators with Range based containers

Hi !
Today I am not going to talk about rendering. This article will deal with expressiveness in C++. Expressiveness? Yes, but about containers, range, and iterators.
If you want to know more about writing expressive code in C++, I advise you to go on fluentcpp.
If you want to know more about range, I advise you to take a look at the Range V3 written by Eric Niebler.
The code you will see may not be the most optimized, but it gives one idea behind what ranges are and how to implement it.

Introduction

How could we define a Range ?

The objective

Prior to defining what a Range is, we are going to see what Range let us do.

Isn’t it amazing to write things like that? Okay for direct operation inside the container, it could be better in two ways:

  1. It is not “easy” to read if you want to compose operation : (unique(sort(range)) is less readable than range | sort | unique in my opinion. But it is juste one “optimisation” to do :).
  2. It may be not optimized since sort returns a Container : here a vector, and build also.

The overloading of operator<< is quite easy though:

Okay, there are a lot of templates. I hope you will not get an allergy to them. All the lines that follow will use and abuse of templates and SFINAE.

The definition of a Range

A range is a way to traverse a container. They are at one abstraction above of iterators. To be simple, a Range own the first iterator, and the final one. It comes with a begin and one end function.

Smart (Or proxy) iterator

Okay, now things are becoming tricky. I hope that I lost no one.

What exactly is an iterator ?

To be simple, an iterator is an abstraction of a pointer.
It exists several catetegories of iterators, to be simple, here is the list:

  1. input : They can be compared, incremented, and dereferenced as rvalue (output ones can be dereferenced as lvalue).
  2. Forward : not a lot of difference with the prior.
  3. Bidirectional: They can be decremented.
  4. Random Access : They support arithmetic operators + and -, inequality comparisons.

Smart Iterator in details

Lazy Initialization

This statement tells us “If the result of the operation is not needed right now, there is no need to compute it”. To be simple, the operation will be done when we will need to get the result. With iterator, it could be done when you dereference it for instance.

Different types of smart iterator

Filter iterator

This iterator will jump a lot of values that do not respect a predicate. For example, if you want only the odd values of your container, when you increment the iterator, it will advance until the next odd value and will skip all the even ones.

Transform iterator

This iterator will dereference the iterator, apply a function to the dereferenced value and returns the result.

Implementation

Basics

Here we are going to implement our own iterator class. This class must be templated twice times.
The first template argument is the iterator we want to iterate on. The second template argument is a tag that we use to perform a kind of tag dispatching.
Moreover, this iterator must behave as … an iterator !

So, we begin to write :

One of above typename will fail if Iterator does not behave like one iterator.
The Tag has a constructor and can own several data (function / functor / lambda), other iterators(the end of the range?) or other things like that.

The iterator must respect the Open Closed Principle. That is why you must not implement the methods inside the class but outside (in a namespace detail for instance). We are going to see these methods later. To begin, we are going to stay focused on the RangeIterator class.

Constructors

We need 3 constructors.
1. Default constructor
2. Variadic templated constructor that build the tag
3. Copy constructor

And we need as well an assignment operator.

There are no difficulties here.

They also need comparison operators !

And they are quite easy !

Reference or value_type dereferencement

I hesitate a lot between return either a reference or a copy. To make transform iterator easier, I make the return by copy.
It means that you cannot dereference them as a lvalue :

The code is a bit tricky now because the dereferencement could not be the value you are waiting for. See the std::back_insert_iterator for instance.

Forward iterator to go farther !

Again, simple code !

Backward iterator, to send you in hell !

Okay, now as promised, we are going to see how beautiful C++ templates are. If you don’t want to be driven crazy, I advise you to stop to read here.
So, we saw that not all iterators have the “backward” range. The idea is to enable this feature ONLY if the iterator (the first template argument) supports it also.
It is the moment to reuse SFINAE (the first time was for the “is_range” structure we saw above).
We are going to use the type_trait std::enable_if<Expr, type>.
How to do that?

You MUST template this function, else the compiler can not delete it !!!

FYI : If you have C++17 enabled, you can use concepts (at least for GCC).

Random iterator

Now you can do it by yourself.
But here some code to help you (because I am a nice guy :p)

Details

Okay, now we are going to see what is hiden by detail::RangeIterator.

Normal iterators

In this namespace, you MUST put the tag and the function on it.

Here are the functions for normal iterator.

It is simple, if it is a normal iterator, it behaves like a normal one.

Transform iterator

I will not talk about the filter iterator since it is not complicated to make it once we understand the ideas. Just be careful about the construct function…

The tag

So, what is a Transform iterator? It is simply one iterator that dereference the value, and apply a function to it.
Here is the Tag structure.

It owns one std::function and that’s it.

The usefulness of the transform iterator is when you dereference it. So you need to reimplement only the dereference function.

Thanks to overloading via tag dispatching this function should (must??) be called without any issues (actually you hope :p).

However, if you want to use several files (thing that I only can to advise you), you cannot do it by this way but specialize your templates. But you cannot partially specialize template function. The idea is to use functor!

Here is a little example using dereference function.

The builder : pipe operator (|)

Okay, you have the iterator, you have the range class, you have your function, but now, how to gather them?

What you want to write is something like that:

First, you need a function that Create one range that own two iterators.
One that begins the set, and the other one that ends it.

Once you have that, you want to overload the pipe operator that makes it simple :

Warning : Don’t forget to take care about rvalue references to be easy to use !

Conclusion

So this article presents a new way to deal with containers. It allows more readable code and take a functional approach. There is a lot of things to learn about it, so don’t stop your learning here. Try to use one of the library below, try to develop yours. Try to learn a functional language and … Have fun !!!!

I hope that you liked this article. It is my first article that discuss only C++. It may contains a lot of errors, if you find one or have any problems, do not forget to tell me!

Reference

Range V3 by Eric Niebler : His range library is really powerfull and I advice you to use it (and I hope that it will be a part of the C++20).
Ranges: The STL to the Next Level : because of (thanks to?) him, I am doing a lot of modifications in all my projects… x).
Range Library by me : I will do a lot of modifications : Performance, conveniance and others.

Barriers in Vulkan : They are not that difficult

Hi !
Yes, I know, I lied, I said that my next article will be about buffers or images, but, finally, I’d prefer to talk about barriers first. However, barriers are, IMHO, a really difficult thing to well understand, so, this article might countain some mistakes.
In that case, please, let me know it by mail, or by one comment.
By the way, this article could remind you in some parts the article on GPU Open : Performance Tweets series: Barriers, fences, synchronization and Vulkan barriers explained

What memory barriers are for?

Memory barriers are source of bugs.
More seriously, barriers are used for three (actually four) things.

  1. Execution Barrier (synchronization) : To ensure that prior commands has finished
  2. Memory Barrier (memory visibility / availability): To ensure that prior writes are visible
  3. Layout Transitioning (useful for image) : To Optimize the usage of the resource
  4. Reformatting

I am not going to talk about reformating because (it is a shame) I am not very confident with it.

What exactly is an execution barrier ?

An execution barrier could remind you mutex on CPU thread. You write something in one resource. When you want to read what you write in, you must wait the write is finished.

What exactly is a memory barrier ?

When you write something from one thread, it could write it on some caches and you must flush them to ensure the visibility where you want to read that data. That is what memory barriers are for.
They ensure as well layout transition for image to get the best performance your graphic card can.

How it is done in Vulkan

Now that we understand why barriers are so important, we are going to see how can we use them in Vulkan.

Vulkan’s Pipeline

Vulkan Pipeline

To be simple, the command enters in the top_of_pipe stage and end at bottom_of_pipe stage.
It exists an extra stage that refers to the host.

Barriers between stages

We are going to see two examples (that are inspired from GPU Open).
We will begin with the worse case : your first command writes at each stage everywhere it is possible, your second command reads at each stage everywhere it is possible.
It simply means that you want to wait for the first command totally finish before the second one begin.

To be simple, with a scheme it means that :
barriers-all_to_all

  • In gray : All the stages that need to be executed before or after the barrier (or the ones that are never reached)
  • In red : Above the barrier, it means where the data are produced. Below the barrier, it means where the data are consumed.
  • In green : They are unblocked stages. You should try to have the maximum green stages as possible.

As you can see, here, you don’t have any green stages, so it is not good at all for performances.

In Vulkan C++, you should have something like that:

Some people use BOTTOM_OF_PIPE as source and TOP_OF_PIPE as the destination. It is not false, but it is useful only for execution barrier. These stages do not access memory, so they can’t make memory access visible or even available!!!! You should not (must not?) issue a memory barrier on these stages, but we are going to see that later.

Now, we are going to see a better case
Imagine your first command fills an image or one buffer (SSBO or imageStore) through the VERTEX_SHADER. Now imagine you want to use these data in EVALUATION_SHADER.
The prior scheme, after modification, is :
barriers in the good way

As you can see, there is a lot of green stages and it is very good!
The Vulkan C++ code should be:

By Region or not?

This part may contain errors, so please, let me know if you disagree with me
To begin, what does by region means?
A region is a little part of your framebuffer. If you specify to use by region dependency, it means that (in fragment buffer space) operations need to be finished only in the region (that is specific to the implementation) and not in the whole image.
Well, it is not clear what is a fragment buffer space. In my opinion, and after reading the documentation, it could be from the EARLY_TEST (or at least FRAGMENT_SHADER if early depth is not enabled) to the COLOR_ATTACHMENT.

Actually, to me this flag lets the driver to optimize a bit. However, it must be used only (and should not be useful elsewhere IMHO) between subpasses for subpasses input attachments).
But I may be wrong !

Everything above about is wrong, if you want a plain explanation, see the comment from devsh. To make it simple, it means that the barrier will operate only on “one pixel” of the image. It could be used for input attachment or pre depth pass for example

Memory Barriers

Okay, now that we have seen how make a pure execution barrier (that means without memory barriers).
Memory barriers ensure the availability for the first half memory dependency and the visibility for the second one. We can see them as a “flushing” and “invalidation”. Make information available does not mean that it is visible.
In each kind of memory barrier you will have a srcAccessMask and a dstAccessMask.
How do they work?

Access and stage are somewhat coupled. For each stage of srcStage, all memory accesses using the set of access types defined in srcAccessMask will be made available. It can be seen as a flush of caches defined by srcAccessMask in all stages.

For dstStage / dstAccess, it is the same thing, but instead to make information available, the information is made visible for these stages and these accesses.

That’s why using BOTTOM/TOP_OF_PIPELINE is meaningless for memory barrier.

For buffer and image barriers, you could as well perform a “releasing of ownership” from a queue to another of the resource you are using.
An example, you transfer the image in your queue that is only used for transfers. At the end, you must perform a releasing from the transfer queue to the compute (or graphic) queue.

Global Memory Barriers

These kind of memory barriers applies to all memory objects that exist at the time of its execution.
I do not have any example of when to use this kind of memory barrier. Maybe if you have a lot of barriers to do, it is better to use global memory barriers.
An example:

Buffer Memory Barriers

Here, accessesMask are valid only for the buffer we are working on through the barrier.
Here is the example :

Image Memory Barriers

Image memory barriers have another kind of utility. They can perform layout transitions.

Example:
I want to create mipmaps associated to one image (we will see the complete function in another article) through vkCmdBlitImage.
After a vkCmdBlitImage, I want use the mipmap I just wrote as a source for the next mipmap level.

oldLayout must be DST_TRANSFER and newLayout must be SRC_TRANSFER.
Which kind of access I made and which kind of access I will do?
That is easy, I performed a TRANSFER_WRITE and I want to perform a TRANSFER_READ.
At each stage my last command “finish” and at each stage my new command “begin”? Both in TRANSFER_STAGE.

In C++ it is done by something like that:

I hope that you enjoyed that article and that you have learned some things. Synchronization through Vulkan is not as easy to handle and all I wrote may (surely?) contains some errors.

Reference:

Memory barriers on TOP_OF_PIPE #128
Specs