Blog

  • Range : Be expressive using smart iterators with Range based containers

    Hi !
    Today I am not going to talk about rendering. This article will deal with expressiveness in C++. Expressiveness? Yes, but about containers, range, and iterators.
    If you want to know more about writing expressive code in C++, I advise you to go on fluentcpp.
    If you want to know more about range, I advise you to take a look at the Range V3 written by Eric Niebler.
    The code you will see may not be the most optimized, but it gives one idea behind what ranges are and how to implement it.

    Introduction

    How could we define a Range ?

    The objective

    Prior to defining what a Range is, we are going to see what Range let us do.

    int main()
    {
        std::list<int> list;
        std::vector<float> vector = {5.0, 4.0, 3.0, 2.0, 1.0, 0.0};
        list << 10 << 9 << 8 << 7 << 6 << vector;
        // list = 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
    
        auto listFiltered = list | superiorThan(4) | multiplyBy(3);
        // type of listFiltered = Range<You do not want to know lol>
        // listFiltered = [10, 9, 8, 7, 6, 5] -> 30, 27, 24, 21, 18, 15
    
        auto listSorted = Range::sort(listFiltered | superiorThan(23));
        // type of listSorted is vector, else use Range::sort<std::list>
        // listSorted = [30, 27, 24] -> 24, 27, 30
    
        std::cout << list << listFiltered << listSorted;
    
        return 0;
    }

    Isn’t it amazing to write things like that? Okay for direct operation inside the container, it could be better in two ways:

    1. It is not “easy” to read if you want to compose operation : (unique(sort(range)) is less readable than range | sort | unique in my opinion. But it is juste one “optimisation” to do :).
    2. It may be not optimized since sort returns a Container : here a vector, and build also.

    The overloading of operator<< is quite easy though:

    // writing
    template<template<typename, typename...> class Container, typename T, typename ...A>
    std::ostream &operator<<(std::ostream &stream, Container<T, A...> const &c) {
        for(auto const &e : c)
            stream << e << " ";
        stream << std::endl;
        return stream;
    }
    
    // Appending
    template<template<typename, typename...> class Container, typename T, typename ...A>
    Container<T, A...> &operator<<(Container<T, A...> &c, T const &v) {
        c.emplace_back(v);
        return c;
    }
    
    // Output must not be an ostream
    template<template<typename, typename> class Output, template<typename, typename> class Input,
             typename T1, typename A1, typename T2, typename A2>
    std::enable_if_t<!std::is_base_of<std::ostream, Output<T1, A1>>::value, Output<T1, A1>&>
    operator<<(Output<T1, A1> &o, Input<T2, A2> const &i) {
        std::copy(i.begin(), i.end(), std::back_inserter(o));
        return o;
    }
    
    template<template<typename, typename> class Output, template<typename> class Range,
             typename T1, typename A1, typename Iterator>
    std::enable_if_t<!std::is_base_of<std::ostream, Output<T1, A1>>::value, Output<T1, A1>&>
    operator<<(Output<T1, A1> &o, Range<Iterator> const &i) {
        std::copy(i.begin(), i.end(), std::back_inserter(o));
        return o;
    }

    Okay, there are a lot of templates. I hope you will not get an allergy to them. All the lines that follow will use and abuse of templates and SFINAE.

    The definition of a Range

    A range is a way to traverse a container. They are at one abstraction above of iterators. To be simple, a Range own the first iterator, and the final one. It comes with a begin and one end function.

    template<typename Iterator>
    class _Range {
    public:
        using __IS_RANGE = void; // helper To know if the type is a range or not
    public:
        using const_iterator = Iterator;
        using value_type = typename const_iterator::value_type;
        explicit _Range(const_iterator begin, const_iterator end) : mBegin(begin), mEnd(end){}
    
        const_iterator begin() const {return mBegin;}
        const_iterator end() const {return mEnd;}
    private:
        const_iterator mBegin;
        const_iterator mEnd;
    };
    
    template<typename T, typename = void>
    struct is_range : std::false_type{};
    
    template<typename T>
    struct is_range<T, typename T::__IS_RANGE> : std::true_type{};
    
    
    template<typename Iterator>
    auto Range(Iterator begin, Iterator end) {
        return _Range<Iterator>(begin, end);
    }
    
    template<typename Container>
    auto Range(Container const &c) {
        return Range(c.begin(), c.end());
    }

    Smart (Or proxy) iterator

    Okay, now things are becoming tricky. I hope that I lost no one.

    What exactly is an iterator ?

    To be simple, an iterator is an abstraction of a pointer.
    It exists several catetegories of iterators, to be simple, here is the list:

    1. input : They can be compared, incremented, and dereferenced as rvalue (output ones can be dereferenced as lvalue).
    2. Forward : not a lot of difference with the prior.
    3. Bidirectional: They can be decremented.
    4. Random Access : They support arithmetic operators + and -, inequality comparisons.

    Smart Iterator in details

    Lazy Initialization

    This statement tells us “If the result of the operation is not needed right now, there is no need to compute it”. To be simple, the operation will be done when we will need to get the result. With iterator, it could be done when you dereference it for instance.

    Different types of smart iterator

    Filter iterator

    This iterator will jump a lot of values that do not respect a predicate. For example, if you want only the odd values of your container, when you increment the iterator, it will advance until the next odd value and will skip all the even ones.

    Transform iterator

    This iterator will dereference the iterator, apply a function to the dereferenced value and returns the result.

    Implementation

    Basics

    Here we are going to implement our own iterator class. This class must be templated twice times.
    The first template argument is the iterator we want to iterate on. The second template argument is a tag that we use to perform a kind of tag dispatching.
    Moreover, this iterator must behave as … an iterator !

    So, we begin to write :

    template<class Iterator, class RangeIteratorTagStructure>
    class RangeIterator {
        Iterator mIt;
        RangeIteratorTagStructure mTag;
    public:
        using iterator_category = typename Iterator::iterator_category;
        using value_type = typename Iterator::value_type;
        using difference_type = typename Iterator::difference_type;
        using pointer = typename Iterator::pointer;
        using reference = typename Iterator::reference;

    One of above typename will fail if Iterator does not behave like one iterator.
    The Tag has a constructor and can own several data (function / functor / lambda), other iterators(the end of the range?) or other things like that.

    The iterator must respect the Open Closed Principle. That is why you must not implement the methods inside the class but outside (in a namespace detail for instance). We are going to see these methods later. To begin, we are going to stay focused on the RangeIterator class.

    Constructors

    We need 3 constructors.
    1. Default constructor
    2. Variadic templated constructor that build the tag
    3. Copy constructor

    And we need as well an assignment operator.

    RangeIterator() = default;
    
    template<typename ...Args>
    RangeIterator(Iterator begin, Args &&...tagArguments) :
        mIt(begin),
        mTag(std::forward<Args>(tagArguments)...) {
        detail::RangeIterator::construct(mIt, mTag);
    }
    
    RangeIterator(RangeIterator const &i) :
        mIt(i.mIt), mTag(i.mTag){}
    
    RangeIterator &operator=(RangeIterator f) {
        using std::swap;
        swap(f.mIt, this->mIt);
        swap(f.mTag, this->mTag);
        return *this;
    }

    There are no difficulties here.

    They also need comparison operators !

    And they are quite easy !

    bool operator !=(RangeIterator const &r) {
        return mIt != r.mIt;
    }
    
    bool operator ==(RangeIterator const &r) {
        return mIt == r.mIt;
    }

    Reference or value_type dereferencement

    I hesitate a lot between return either a reference or a copy. To make transform iterator easier, I make the return by copy.
    It means that you cannot dereference them as a lvalue :

    *it = something; // Does not work.

    The code is a bit tricky now because the dereferencement could not be the value you are waiting for. See the std::back_insert_iterator for instance.

    decltype(detail::RangeIterator::dereference(std::declval<Iterator>(), std::declval<RangeIteratorTagStructure>())) operator*() {
        return detail::RangeIterator::dereference(mIt, mTag);
    }
    
    decltype(detail::RangeIterator::dereference(std::declval<Iterator>(), std::declval<RangeIteratorTagStructure>())) operator->() {
        return detail::RangeIterator::dereference(mIt, mTag);
    }

    Forward iterator to go farther !

    Again, simple code !

    RangeIterator &operator++() {
        detail::RangeIterator::increment(mIt, mTag);
        return *this;
    }

    Backward iterator, to send you in hell !

    Okay, now as promised, we are going to see how beautiful C++ templates are. If you don’t want to be driven crazy, I advise you to stop to read here.
    So, we saw that not all iterators have the “backward” range. The idea is to enable this feature ONLY if the iterator (the first template argument) supports it also.
    It is the moment to reuse SFINAE (the first time was for the “is_range” structure we saw above).
    We are going to use the type_trait std::enable_if<Expr, type>.
    How to do that?

    template<class tag = iterator_category>
    std::enable_if_t<std::is_base_of<std::bidirectional_iterator_tag, tag>::value,
    RangeIterator>
    &operator--() {
        detail::RangeIterator::decrement(mIt, mTag);
        return *this;
    }

    You MUST template this function, else the compiler can not delete it !!!

    FYI : If you have C++17 enabled, you can use concepts (at least for GCC).

    Random iterator

    Now you can do it by yourself.
    But here some code to help you (because I am a nice guy :p)

    template<class tag = iterator_category>
    std::enable_if_t<std::is_base_of<std::random_access_iterator_tag, tag>::value, RangeIterator>
    &operator+=(std::size_t n) {
        detail::RangeIterator::plusN(mIt, n, mTag);
        return *this;
    }
    
    template<class tag = iterator_category>
    std::enable_if_t<std::is_base_of<std::random_access_iterator_tag, tag>::value, RangeIterator>
    operator+(std::size_t n) {
        auto tmp(*this);
        tmp += n;
        return tmp;
    }
    
    template<class tag = iterator_category>
    std::enable_if_t<std::is_base_of<std::random_access_iterator_tag, tag>::value, difference_type>
    operator-(RangeIterator const &it) {
        return detail::RangeIterator::minusIterator(mIt, it.mIt, mTag);
    }
    
    template<class tag = iterator_category>
    std::enable_if_t<std::is_base_of<std::random_access_iterator_tag, tag>::value, bool>
    operator<(RangeIterator const &f) {
        return mIt < f.mIt;
    }
    
    // Operator a + iterator
    template<template<typename, typename> class RIterator, typename iterator, typename tag, typename N>
    std::enable_if_t<std::is_base_of<std::random_access_iterator_tag, typename iterator::iterator_category>::value,
    RIterator<iterator, tag>> operator+(N n, RIterator<iterator, tag> const &it) {
        auto tmp(it);
        tmp += n;
        return tmp;
    }

    Details

    Okay, now we are going to see what is hiden by detail::RangeIterator.

    Normal iterators

    In this namespace, you MUST put the tag and the function on it.

    Here are the functions for normal iterator.

    /*********** NORMAL ************/
    template<typename Iterator, typename Tag>
    inline void construct(Iterator , Tag) {
    
    }
    
    template<typename Iterator, typename Tag>
    inline typename Iterator::value_type dereference(Iterator it, Tag) {
        return *it;
    }
    
    template<typename Iterator, typename Tag>
    inline void increment(Iterator &it, Tag) {
        ++it;
    }
    
    template<typename Iterator, typename Tag>
    inline void decrement(Iterator &it, Tag) {
        --it;
    }
    
    template<typename Iterator, typename Tag>
    inline void plusN(Iterator &it, std::size_t n, Tag) {
        it += n;
    }
    
    template<typename Iterator, typename Tag>
    inline void minusN(Iterator &it, std::size_t n, Tag) {
        it -= n;
    }
    
    template<typename Iterator, typename Tag>
    inline typename Iterator::difference_type minusIterator(Iterator i1, Iterator const &i2, Tag) {
        return i1 - i2;
    }

    It is simple, if it is a normal iterator, it behaves like a normal one.

    Transform iterator

    I will not talk about the filter iterator since it is not complicated to make it once we understand the ideas. Just be careful about the construct function…

    The tag

    So, what is a Transform iterator? It is simply one iterator that dereference the value, and apply a function to it.
    Here is the Tag structure.

    template<typename Iterator, typename Functor>
    struct Transform final {
        Transform() = default;
        Transform(Functor f) : f(f){}
        Transform(Transform const &f) : f(f.f){}
    
        std::function<typename Iterator::value_type(typename Iterator::value_type)> f;
    };

    It owns one std::function and that’s it.

    The usefulness of the transform iterator is when you dereference it. So you need to reimplement only the dereference function.

    template<typename Iterator, typename Functor>
    inline typename Iterator::value_type dereference(Iterator it, Transform<Iterator, Functor> f) {
        return f.f(*it);
    }

    Thanks to overloading via tag dispatching this function should (must??) be called without any issues (actually you hope :p).

    However, if you want to use several files (thing that I only can to advise you), you cannot do it by this way but specialize your templates. But you cannot partially specialize template function. The idea is to use functor!

    Here is a little example using dereference function.

    decltype(std::declval<detail::RangeIterator::dereference<Iterator, Tag>>()(std::declval<Iterator>(), std::declval<Tag>())) operator*() {
        return detail::RangeIterator::dereference<Iterator, Tag>()(mIt, mTag);
    }
    
    // Normal iterator
    template<typename Iterator, typename Tag>
    struct dereference {
        inline typename Iterator::value_type operator()(Iterator it, Tag) const {
            return *it;
        }
    };
    
    // Transform iterator
    template<typename Iterator, typename Functor>
    struct dereference<Iterator, Transform<Iterator, Functor>> {
        inline typename Iterator::value_type operator()(Iterator it, Transform<Iterator, Functor> f) {
            return f.f(*it);
        }
    };
    The builder : pipe operator (|)

    Okay, you have the iterator, you have the range class, you have your function, but now, how to gather them?

    What you want to write is something like that:

    auto range = vector | [](int v){return v * 2;};

    First, you need a function that Create one range that own two iterators.
    One that begins the set, and the other one that ends it.

    template<typename Container, typename Functor>
    auto buildTransformRange(Container const &c, Functor f) {
        using Iterator = RangeIterator<typename Container::const_iterator,
                                       detail::RangeIterator::Transform<typename Container::const_iterator, Functor>>;
        Iterator begin(c.begin(), f);
        Iterator end(c.end(), f);
        return Range(begin, end);
    }
    

    Once you have that, you want to overload the pipe operator that makes it simple :

    template<typename R, typename Functor>
    auto operator|(R const &r, Functor f) -> std::enable_if_t<std::is_same<std::result_of_t<Functor(typename R::value_type)>, typename R::value_type>::value, decltype(Range::buildTransformRange(r, f))> {
        return Range::buildTransformRange(r, f);
    }

    Warning : Don’t forget to take care about rvalue references to be easy to use !

    Conclusion

    So this article presents a new way to deal with containers. It allows more readable code and take a functional approach. There is a lot of things to learn about it, so don’t stop your learning here. Try to use one of the library below, try to develop yours. Try to learn a functional language and … Have fun !!!!

    I hope that you liked this article. It is my first article that discuss only C++. It may contains a lot of errors, if you find one or have any problems, do not forget to tell me!

    Reference

    Range V3 by Eric Niebler : His range library is really powerfull and I advice you to use it (and I hope that it will be a part of the C++20).
    Ranges: The STL to the Next Level : because of (thanks to?) him, I am doing a lot of modifications in all my projects… x).
    Range Library by me : I will do a lot of modifications : Performance, conveniance and others.

  • Mipmap generation : Transfers, transition layout

    Hi guys !
    This article will deal with Mipmap’s generation.

    What is a Mipmap ?

    A mipmap is a kind of loop rescaling on a texture.
    For example, take one 512×128 texture, you will divide the size by 4 at each level :

    1. level 0 : 512×128
    2. level 1: 256×64
    3. level 2: 128×32
    4. level 3: 64×16
    5. level 4: 32×8
    6. level 5: 16×4
    7. level 6: 8×2
    8. level 7: 4×1
    9. level 8: 2×1
    10. level 9: 1×1

    Mipmap

    Why do we need Mipmap ?

    It can be seen as a Level of Detail (LoD). If the object is far from the camera, you do not need to use all details but only some. So, instead to send to the GPU all the texture, you can send a mipmap with a different level that is far away lighter than the original (level 0).

    How to generate Mipmap in Vulkan ?

    Contrary to OpenGL which provide glGenerateTextureMipmap function, Vulkan does not provide function to build mipmap by itself. You have to deal with it by yourself. There are two ways.

    1. Using the shaders and framebuffers. You use the shader to draw into the framebuffer which is half the size of the texture, and half of the half…
    2. Using the transfer queue and vkCmdBlitImage which blit one image into another.

    We are going to see the second way.
    To do it, we are going to use the Transferer class we saw prior.

    First, the number of mipmaps level for one image is :
    levels=floor(log_2(max(width, height))) + 1

    The idea of the algorithm to create the differents mipmap levels is easy.

    1. You initialize the level 0 (from a file for example) and put the layout to TRANSFER_SRC
    2. You set the level 1 to TRANSFER_DST
    3. You blit the level 0 to the level 1
    4. You set the level 1 to TRANSFER_SRC
    5. You reitere 2 3 4 for each level.
    6. You transition all levels to the layout you need.

    So, here is our code :

    Beginning with the CommandBuffer

    void Transferer::buildMipMap(Image &src) {
        vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer(nullptr);
        vk::CommandBufferBeginInfo beginInfo(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    
        cmd.begin(beginInfo);

    Prepare the blit !

        for(uint32_t i = 1; i < src.getMipLevels(); ++i) {
            vk::ImageBlit blit;
            blit.srcSubresource.aspectMask = vk::ImageAspectFlagBits::eColor;
            blit.srcSubresource.baseArrayLayer = 0;
            blit.srcSubresource.layerCount = 1;
            blit.srcSubresource.mipLevel = i - 1;
            blit.dstSubresource.aspectMask = vk::ImageAspectFlagBits::eColor;
            blit.dstSubresource.baseArrayLayer = 0;
            blit.dstSubresource.layerCount = 1;
            blit.dstSubresource.mipLevel = i;
    
            // each mipmap is the size divided by two
            blit.srcOffsets[1] = vk::Offset3D(std::max(1u, src.getSize().width >> (i - 1)),
                                              std::max(1u, src.getSize().height >> (i - 1)),
                                              1);
    
            blit.dstOffsets[1] = vk::Offset3D(std::max(1u, src.getSize().width >> i),
                                              std::max(1u, src.getSize().height >> i),
                                              1);
    

    The loop begins from 1 because the level 0 is already initialized.
    After, you explain to Vulkan which level you will use as the source, and which one you will use as the destination.
    Do not forget to use max when you compute the offset because if you do not use it, you will be unable to build mipmap for the last levels if your image is not with a 1:1 ratio.

    Transition and Blit

    After, you have to transition your mipmap level image layout you want to draw into to TRANSFER_DST

     vk::ImageSubresourceRange range(vk::ImageAspectFlagBits::eColor, i, 1, 0, 1);
     // transferDst go to transferSrc because this mipmap will be the source for the next iteration (the next level)
     vk::ImageMemoryBarrier preBlit = transitionImage(src, vk::ImageLayout::eUndefined, vk::ImageLayout::eTransferDstOptimal, range);
     cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTopOfPipe,
                         vk::PipelineStageFlagBits::eTransfer,
                         vk::DependencyFlags(),
                         nullptr, nullptr,
                         preBlit);
    
    cmd.blitImage(src, vk::ImageLayout::eTransferSrcOptimal,
                  src, vk::ImageLayout::eTransferDstOptimal, blit,
                  vk::Filter::eLinear);
    

    And you just use blitImage to blit it.

    After, you have to transition the mipmap level image layout to TRANSFER_SRC

     vk::ImageMemoryBarrier postBlit = transitionImage(src, vk::ImageLayout::eTransferDstOptimal, vk::ImageLayout::eTransferSrcOptimal, range);
     cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                         vk::PipelineStageFlagBits::eTransfer,
                         vk::DependencyFlags(),
                         nullptr, nullptr,
                         postBlit);

    Finish

    You have to transition all mipmap levels to the layout you want to use

    vk::ImageSubresourceRange range(vk::ImageAspectFlagBits::eColor, 0, VK_REMAINING_MIP_LEVELS, 0, 1);
    
    // transition all mipmap levels to shaderReadOnlyOptimal
    vk::ImageMemoryBarrier transition = transitionImage(src,
                                                        vk::ImageLayout::eTransferSrcOptimal,
                                                        vk::ImageLayout::eShaderReadOnlyOptimal,
                                                        range);
    
    cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                        vk::PipelineStageFlagBits::eAllCommands,
                        vk::DependencyFlags(),
                        nullptr, nullptr, transition);

    Conclusion

    This article was short, but mipmap are not that difficult to handle. Do you like this kind of short article?
    Maybe the next article will be about descriptor set management.

    Reference

    Sascha Willems Mipmap

  • Buffer management with Vulkan : transfer, Staging buffer

    Hi guys !
    I keep my promise and I am coming with explanations and implementation on how to use and manage one (or several) buffer in Vulkan application.

    How I manage my resources ?

    shared_ptr?

    Firstly, I have a way to manage Vulkan resource that is a bit weird. The idea is to “emulate” the behaviour of shared_ptr and enable copy / move.
    So, if you do that :

    Image b1(); // count = 1
    Image b2 = b1; // count = 2

    b1 and b2 are exactly the same Vulkan’s Image.

    A counter?

    To emulate the behaviour of a shared_ptr, I created one class that is simply a Counter.

    class Counter
    {
    public:
        Counter() = default;
        Counter(Counter const &counter);
        Counter(Counter &&counter) = default;
        Counter &operator=(Counter counter);
    
        uint32_t getCount() const;
    
        virtual ~Counter();
    protected:
        std::shared_ptr<uint32_t> mCount = std::make_shared<uint32_t>(1);
    };
    
    Counter::Counter(const Counter &counter) :
        mCount(counter.mCount) {
        ++(*mCount);
    }
    
    Counter &Counter::operator =(Counter counter) {
        using std::swap;
        swap(mCount, counter.mCount);
        return *this;
    }
    
    uint32_t Counter::getCount() const {
        return *mCount;
    }
    
    Counter::~Counter() {
    
    }
    

    A Vulkan Resource

    A Vulkan resource lives through a device. So I wrote this little class that represents a Vulkan Resource :

    class VkResource : public Counter
    {
    public:
        VkResource() = default;
        VkResource(Device const &device);
        VkResource(VkResource &&vkResource) = default;
        VkResource(VkResource const &vkResource) = default;
        VkResource &operator=(VkResource &&vkResource) = default;
        VkResource &operator=(VkResource const &vkResource) = default;
    
        vk::Device getDevice() const;
    
    protected:
        std::shared_ptr<Device> mDevice;
    };
    
    
    VkResource::VkResource(const Device &device) :
        mDevice(std::make_shared<Device>(device)) {
    
    }
    
    vk::Device VkResource::getDevice() const {
        return *mDevice;
    }
    

    Buffer in Vulkan

    Unlike OpenGL, buffers in Vulkan are separated from memory. You must bind the memory to them. Since you can choose if you want the memory on the device_local heap or in the host_visible, you can chose which heap your buffer will use.

    So what are buffer made with ?

    Buffers are made with a size, a usage (vertex? uniform ?), one block of memory, one ptr if the buffer is HOST_VISIBLE etc.
    My buffer class is :

    class Buffer : public VkResource, public vk::Buffer
    {
    public:
        Buffer() = default;
    
        Buffer(Device &device, vk::BufferUsageFlags usage, vk::DeviceSize size,
               std::shared_ptr<AbstractAllocator> allocator, bool shouldBeDeviceLocal);
    
        Buffer(Buffer &&buffer) = default;
        Buffer(Buffer const &buffer) = default;
        Buffer &operator=(Buffer const &buffer);
    
        vk::DeviceSize getSize() const;
        vk::BufferUsageFlags getUsage() const;
        bool isDeviceLocal() const;
        void *getPtr();
        std::shared_ptr<AbstractAllocator> getAllocator();
    
        ~Buffer();
    
    private:
        std::shared_ptr<AbstractAllocator> mAllocator;
        std::shared_ptr<vk::DeviceSize> mSize = std::make_shared<vk::DeviceSize>();
        std::shared_ptr<vk::BufferUsageFlags> mUsage = std::make_shared<vk::BufferUsageFlags>();
        std::shared_ptr<vk::MemoryRequirements> mRequirements = std::make_shared<vk::MemoryRequirements>();
        std::shared_ptr<vk::PhysicalDeviceMemoryProperties> mProperties = std::make_shared<vk::PhysicalDeviceMemoryProperties>();
        std::shared_ptr<Block> mBlock = std::make_shared<Block>();
        std::shared_ptr<bool> mIsDeviceLocal;
        std::shared_ptr<void *> mPtr = std::make_shared<void *>(nullptr);
    
        void createBuffer();
        void allocate(bool shouldBeDeviceLocal);
    };
    

    It may be is a bit complicate, but it is not really that difficult. A buffer will be created with an usage, one size and one boolean to put or not this buffer in device_local memory.
    The creation of the buffer is quite simple. You just have to give the size and the usage :

    vk::BufferCreateInfo createInfo(vk::BufferCreateFlags(),
                                    *mSize,
                                    *mUsage,
                                     vk::SharingMode::eExclusive);
    
    m_buffer = mDevice->createBuffer(createInfo);
    *mRequirements = mDevice->getBufferMemoryRequirements(m_buffer);

    The last line is to get the memory requirements. It will give you the real size you need (padding or other things) and list of memory types that can be used with the buffer.
    To get the memory type index, I developed this function which cares about device local memory or host visible memory :

    int findMemoryType(uint32_t memoryTypeBits,
                       vk::PhysicalDeviceMemoryProperties const &properties,
                       bool shouldBeDeviceLocal) {
    
        auto lambdaGetMemoryType = [&](vk::MemoryPropertyFlags propertyFlags) -> int {
            for(uint32_t i = 0; i < properties.memoryTypeCount; ++i)
                if((memoryTypeBits & (1 << i)) &&
                ((properties.memoryTypes[i].propertyFlags & propertyFlags) == propertyFlags))
                    return i;
            return -1;
        };
    
        if(!shouldBeDeviceLocal) {
            vk::MemoryPropertyFlags optimal = vk::MemoryPropertyFlagBits::eHostCached |
                    vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            vk::MemoryPropertyFlags required = vk::MemoryPropertyFlagBits::eHostCoherent |
                    vk::MemoryPropertyFlagBits::eHostVisible;
    
            int type = lambdaGetMemoryType(optimal);
            if(type == -1) {
                int result = lambdaGetMemoryType(required);
                if(result == -1)
                    assert(!"Memory type does not find");
                return result;
            }
            return type;
        }
    
        else
            return lambdaGetMemoryType(vk::MemoryPropertyFlagBits::eDeviceLocal);
    }

    This code was made with the specifications themselves.
    Now we should allocate memory for our buffers :

     int memoryTypeIndex = findMemoryType(mRequirements->memoryTypeBits, *mProperties, shouldBeDeviceLocal);
    
        *mBlock = mAllocator->allocate(mRequirements->size, mRequirements->alignment, memoryTypeIndex);
        mDevice->bindBufferMemory(m_buffer, mBlock->memory, mBlock->offset);
    
        // if host_visible, we can map it
        if(!shouldBeDeviceLocal)
            *mPtr = mDevice->mapMemory(mBlock->memory, mBlock->offset,
                                      *mSize, vk::MemoryMapFlags());

    As you can see, you allocate the memory, and you bind the memory. If the memory is host visible, you can map it.

    Now we have a class to manage our buffers. But it is not finished at all !

    Staging resources

    We cannot write directly to the device_local memory. We must use something that we call a staging resource. Staging resources can be buffers or images. The idea is to bind a host visible memory to a staging resource, and transfer the memory through the staging resource to a resource with memory that resides in device_local memory.

    staging buffer

    Command Buffers submitting

    Before to transfer anything, I wanted to have a class that manages the submitting of command buffers. When the work is done, the command submitter should notify transferer object that use it. I used an observer pattern :

    class ObserverCommandBufferSubmitter {
    public:
        virtual void notify() = 0;
    };
    
    class CommandBufferSubmitter
    {
    public:
        CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers);
    
        void addObserver(ObserverCommandBufferSubmitter *observer);
    
        vk::CommandBuffer createCommandBuffer();
    
        void submit();
        void wait();
    
    protected:
        std::shared_ptr<Device> mDevice;
        std::shared_ptr<vk::Queue> mQueue;
        std::shared_ptr<CommandPool> mCommandPool;
        std::shared_ptr<std::vector<vk::CommandBuffer>> mCommandBuffers = std::make_shared<std::vector<vk::CommandBuffer>>();
        std::shared_ptr<Fence> mFence;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
        std::shared_ptr<std::vector<ObserverCommandBufferSubmitter*>> mObservers = std::make_shared<std::vector<ObserverCommandBufferSubmitter*>>();
    };
    
    CommandBufferSubmitter::CommandBufferSubmitter(Device &device, uint32_t numberCommandBuffers) :
        mDevice(std::make_shared<Device>(device)),
        mQueue(std::make_shared<vk::Queue>(device.getTransferQueue())),
        mCommandPool(std::make_shared<CommandPool>(device, true, true, device.getIndexTransferQueue())),
        mFence(std::make_shared<Fence>(device, false)) {
        *mCommandBuffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, numberCommandBuffers);
    }
    
    void CommandBufferSubmitter::addObserver(ObserverCommandBufferSubmitter *observer) {
        mObservers->emplace_back(observer);
    }
    
    vk::CommandBuffer CommandBufferSubmitter::createCommandBuffer() {
        if(*mIndex >= mCommandBuffers->size()) {
            auto buffers = mCommandPool->allocate(vk::CommandBufferLevel::ePrimary, 10);
    
            for(auto &b : buffers)
                mCommandBuffers->emplace_back(b);
        }
    
        return (*mCommandBuffers)[(*mIndex)++];
    }
    
    void CommandBufferSubmitter::submit() {
        vk::SubmitInfo info;
        info.setCommandBufferCount(*mIndex).setPCommandBuffers(mCommandBuffers->data());
        mFence->reset();
        mQueue->submit(info, *mFence);
    }
    
    void CommandBufferSubmitter::wait() {
        *mIndex = 0;
        mFence->wait();
        mFence->reset();
        for(auto &observer : *mObservers)
            observer->notify();
    }
    

    The code is not difficult, it allocates if needed one command buffer and return it and use fencing to know if works are completed.

    Buffer transferer

    You guessed that our Buffer transferer must implement the abstract class :

    class BufferTransferer : public ObserverCommandBufferSubmitter
    {
    public:
        BufferTransferer(Device &device, uint32_t numberBuffers, vk::DeviceSize sizeTransfererBuffers,
                         std::shared_ptr<AbstractAllocator> allocator, CommandBufferSubmitter &commandBufferSubmitter);
    
        void transfer(const Buffer &src, Buffer &dst,
                      vk::DeviceSize offsetSrc,
                      vk::DeviceSize offsetDst,
                      vk::DeviceSize size);
    
        void transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data);
    
        void notify();
    
    private:
        std::shared_ptr<CommandBufferSubmitter> mCommandBufferSubmitter;
        std::shared_ptr<std::vector<Buffer>> mTransfererBuffers = std::make_shared<std::vector<Buffer>>();
        std::shared_ptr<uint32_t> mSizeTransfererBuffers;
        std::shared_ptr<uint32_t> mIndex = std::make_shared<uint32_t>(0);
    };

    The idea is to have several buffers ready to transfer data. Why this idea? Because users may don’t care about the CPU buffer and only want a GPU Buffer ! Thanks to that, if he wants to transfer data like glBufferSubData, he actually can !

    The code to transfer a buffer is not complicated at all. However, you just have to be careful about the memory barrier. Personally, I use one from Transfer to ALL_Commands in this case.

    void BufferTransferer::notify() {
        *mIndex = 0;
    }
    
    void BufferTransferer::transfer(Buffer const &src, Buffer &dst,
                                    vk::DeviceSize offsetSrc, vk::DeviceSize offsetDst,
                                    vk::DeviceSize size) {
        // Check if size and usage are legals
        assert((src.getUsage() & vk::BufferUsageFlagBits::eTransferSrc) ==
                    vk::BufferUsageFlagBits::eTransferSrc);
        assert((dst.getUsage() & vk::BufferUsageFlagBits::eTransferDst) ==
                    vk::BufferUsageFlagBits::eTransferDst);
    
        assert(src.getSize() >= (offsetSrc + size));
        assert(dst.getSize() >= (offsetDst + size));
    
        // Prepare the region copied
        vk::BufferCopy region(offsetSrc, offsetDst, size);
    
        vk::CommandBufferBeginInfo begin(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    
        vk::CommandBuffer cmd = mCommandBufferSubmitter->createCommandBuffer();
    
        cmd.begin(begin);
        cmd.copyBuffer(src, dst, {region});
        cmd.pipelineBarrier(vk::PipelineStageFlagBits::eTransfer,
                            vk::PipelineStageFlagBits::eAllCommands,
                            vk::DependencyFlags(),
                            nullptr,
                            vk::BufferMemoryBarrier(vk::AccessFlagBits::eTransferWrite,
                                                    vk::AccessFlagBits::eMemoryRead,
                                                    VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
                                                    dst, offsetSrc, size),
                            nullptr);
        cmd.end();
    
    }
    
    void BufferTransferer::transfer(Buffer &buffer, vk::DeviceSize offset, vk::DeviceSize size, void *data) {
        if(*mIndex == mTransfererBuffers->size()) {
            mCommandBufferSubmitter->submit();
            mCommandBufferSubmitter->wait();
        }
        assert(size <= *mSizeTransfererBuffers);
        memcpy((*mTransfererBuffers)[*mIndex].getPtr(), data, size);
        transfer((*mTransfererBuffers)[*mIndex], buffer, 0, offset, size);
        (*mIndex)++;
    }
    

    I do not manage the reallocation if the dst buffer is too small, or loop / recursion the transfer when our staging buffer is too small, but with our architecture, It would not be difficult to manage these cases !

    How to use it??

    Simply like that !

    CommandBufferSubmitter commandBufferSubmitter(device, 1);
    BufferTransferer bufferTransferer(device, 1, 1 << 20, deviceAllocator, commandBufferSubmitter);
    glm::vec2 quad[] = {glm::vec2(-1, -1), glm::vec2(1, -1), glm::vec2(-1, 1), glm::vec2(1, 1)};
    Buffer vbo(device, vk::BufferUsageFlagBits::eTransferDst | vk::BufferUsageFlagBits::eVertexBuffer, sizeof quad, deviceAllocator, true);
    bufferTransferer.transfer(vbo, 0, sizeof quad, quad);
    commandBufferSubmitter.submit();
    commandBufferSubmitter.wait();

    I saw that a high number of my visits comes from twitter. If you want to follow me: it is here.

    Kisses and see you soon to see how to load / manage images !