Tag: C++14

  • Multithreading in an expressive way with monadic expression

    Hi, there is a long time I have not written anything, sorry for that! I planned to write several articles during these few next weeks. This article will deal with multithreading.

    Introduction

    In a lot of articles, we can read that multithreading is not easy, and other things like that. In this article, we will learn how to write multi-threaded application in a simple and a fun way.
    The idea will be to write something like that

    int main() {
        auto test = when_all(
            [] {return 8; },    // return 8
            [] {return 25; }    // return 25
        ) // tuple(8, 25)
        .then(
            [](auto a, auto b) {return std::make_tuple(a, b); }, // return tuple(8, 25)
            [](auto a, auto b) {std::cout << a << "-" << b << " = "; }, // return nothing
            [](auto a, auto b) {return a - b; } // return - 17
        ) // tuple(tuple(8, 25), -17)
        .deferredExecutor();
    
        // Launch the chain of function in an asynchronous way
        test.execute(asynchronous_scheduler{});
        // Do some work very expensive
        auto result = test.waitAndGet();
        auto first = std::get<0>(result); // tuple(8, 25)
        auto second = std::get<1>(result); // -17
        std::cout << second << std::endl;
    
        return 0;
    }

    If we write a little scheme, it gives us that :

    Multithreading in action
    Multithreading in action

    Obviously, when the function f3 starts, functions f1 and f2 must have complete. The same thing happens for the getResult. It must wait for f3, f4, f5 to complete.

    Does it really exist?

    Actually, it does, and it will even be in the standard in C++20. However, the feature from the C++20 suffers from different problems that are explained in the Vittorio Romeo article. Actually, this article could be a continuation of the series of articles from Vittorio.

    Multithreading with Monadic Expression: Implementation

    I hope you really love read ununderstandable C++ code because the code is full of template and metaprogramming. I recommend to have some basis in really modern C++ or read the series from Vittorio I was talking before. We are going to see a lot of things both in multithreading and metaprogramming

    Why do we need a continuation of the series from Vittorio?

    Firstly, the code provided by Vittorio does not work on Visual Studio, and I really want to have my codes work on all compilers and all operating systems. Second, in its article, he only provides a version using a function waitAndGet that prevents us to insert code into the main thread in an easy way.
    The third issue is that the group of functions returns a std::tuple

    For example, you must have to write something like that

    when_all(
        []{return 8;},
        []{return 25;}
    ).then([](auto tuple) {auto [a, b] = tuple; return a - b;});
    
    // Instead of
    when_all(
        []{return 8;},
        []{return 25;}
    ).then([](auto a, auto b) {return a - b;});

    But, in reality, some people will prefer the first version, other will prefer the second version. What will happen if a function does not return any arguments? According to me, the second version is more natural.

    Forwarding

    Because we are going to use a lot of metaprogramming, we may need to perfect forward some arguments. Here are two macro that defines a perfect forwarding. There is one macro for a normal variable and another one for auto. I was not able to use the same one for both cases because sometimes, use the form ::std::forward<decltype(variable)>(variable); is too difficult for Visual Studio, that is why I provide these both declarations.

    #pragma once
    #include <utility>
    
    #define FWD(T, x) ::std::forward<T>(x)
    #define FWD_AUTO(x) ::std::forward<decltype(x)>(x)

    This code does not really need any explanations.

    A Latch

    A latch is useful when you want to know if a job is finished or not. Let’s say you have two thread, which one of them is waiting the result coming from the second one. The first one need to wait the second to finish.

    Latch
    Latch

    The following code shows how to implement a simple “bool_latch”

    #pragma once
    #include <mutex>
    #include <condition_variable>
    
    class bool_latch {
    public:
        /**
         * Function to call when the job is finished
         */
        void job_done() {
            {
                std::scoped_lock<std::mutex> lock(mMutex);
                mDone = true;
            }
            mConditionVariable.notify_one();
        }
    
        /**
         * Function to call when you want to wait for the job to be done
         */
        void wait_for_job_done() {
            std::unique_lock<std::mutex> lock(mMutex);
            mConditionVariable.wait(lock, [this] {return mDone; });
        }
    
    private:
        std::mutex mMutex;
        std::condition_variable mConditionVariable;
        bool mDone{ false };
    };
    

    The code is based on condition_variable, mutex, and a boolean variable

    The return type

    Each function may return one value. However, what returns a group of functions? The better choice is to return a std::tuple. Thus, if you have three functions that return an int at the same level (running in 3 parallels threads), the result will be a std::tuple<int, int, int>.

    Functions that return void.

    The simple way for this case is to create an empty type that we will name nothing. Its declaration is straightforward.

    struct nothing{};

    The function, instead to return nothing, must return the type nothing.
    Let’s say you have three functions, f1, f2, f3. f1 and f3 return an int and f2 returns nothing. You will get a std::tuple<int, nothing, int>
    How to return nothing instead of void? This function explains that in a straightforward way!

    namespace detail {
        template<typename F, typename ...Args>
        /**
         This function returns f(args...) when decltype(f(args...)) is not void. Returns nothing{} else.
         @param f - The function
         @param args... - Arguments to give to the function
         @result - returns f(args...) if possible, or nothing{} else.
        */
        inline decltype(auto) callThatReturnsNothingInsteadOfVoid(F &&f, Args &&...args) {
            if constexpr(std::is_void_v<std::invoke_result_t<F, Args...>>) {
                f(FWD(Args, args)...);
                return nothing{};
            }
    
            else
                return f(FWD(Args, args)...);
        }
    }

    Pass a tuple as an argument

    Okay, now say you have a std::tuple<int, double, std::string> and, you want to give it to one function with a prototype like returnType function(int, double, std::string).
    One way to do that is to use the function apply.

    Here there is no problem, but assume that there is a nothing in your tuple like std::tuple<int, nothing, int>.
    When you will apply this tuple to the function, you will also pass the nothing.

    To avoid such problem, you must filter all the arguments and store the arguments inside a lambda function.

    namespace detail {
        template<typename F>
        inline decltype(auto) callWithoutNothingAsArgument_Impl(F &&f) {
            return callThatReturnsNothingInsteadOfVoid(FWD(F, f));
        }
    
        template<typename F, typename T, typename ...Args>
        inline decltype(auto) callWithoutNothingAsArgument_Impl(F &&f, T &&t, Args &&...args) {
            return callWithoutNothingAsArgument_Impl([&f, &t](auto &&...xs) -> decltype(auto) {
                if constexpr(std::is_same_v < nothing, std::decay_t<T>>) {
                    return f(FWD_AUTO(xs)...);
                }
    
                else {
                    return f(FWD(T, t), FWD_AUTO(xs)...);
                }
            }, FWD(Args, args)...);
        }
    }
    
    template<typename F, typename ...Args>
    inline decltype(auto) callWithoutNothingAsArgument(F &&f, std::tuple<Args...> tuple) {
        return std::apply([&f](auto ...xs)->decltype(auto) {
            return detail::callWithoutNothingAsArgument_Impl(f, xs...);
        }, tuple);
    }

    Architecture

    This part will be the most difficult part of the article. We will see how the architecture work. It is an architecture with nodes. We will see three kinds of nodes. The root node that is the beginning of the chain of functions, the when_all node that can own one or several functions which will run in parallel and one node result_getter that represents the end of the chain of functions.

    Overview

    Overview of multithreading
    Overview of multithreading

    As you may have noticed, there is no type erasure. The type owned by the caller is the result_getter. However, the result_getter, when it executes all the functions, it must return to the root. And to do that, it must go through the when_all.

    Now go to explain whole the code!

    root

    #pragma once
    #include <tuple>
    #include "fwd.h"
    
    namespace detail {
        class root {
            template<typename Parent, typename... F>
            friend class when_all;
    
            // A node produce nothing
            using output_type = std::tuple<>;
    
        private:
            // Once you are at the root, you can call the execute function
            template<typename Scheduler, typename Child, typename ...GrandChildren>
            void goToRootAndExecute(Scheduler &&s, Child &directChild, GrandChildren &... grandChildren) & {
                execute(FWD(Scheduler, s), directChild, grandChildren...);
            }
    
            // You must use the scheduler
            template<typename Scheduler, typename Child, typename ...GrandChildren>
            void execute(Scheduler &&s, Child &directChild, GrandChildren &... grandChildren) & {
                s([&]() -> decltype(auto) {return directChild.execute(FWD(Scheduler, s), output_type{}, grandChildren...); });
            }
        };
    }

    There is nothing difficult here. The when_all is a friend. There are two functions, the first one is called by the children and its purpose it’s only to reach the top. After, you execute the child functions through the scheduler.

    when_all

    #pragma once
    #include 
    #include "enumerate_args.h"
    #include "result_getter.h"
    
    namespace detail {
        struct movable_atomic_size_t : std::atomic_size_t {
            using std::atomic_size_t::atomic;
            movable_atomic_size_t(movable_atomic_size_t &&v) : std::atomic_size_t(v.load(std::memory_order_acquire)) {}
        };
    
        template
        class when_all : Parent, Fs... {
            friend class root;
    
            template
            friend class ::detail::when_all;
    
            template
            friend class result_getter;
    
            using input_type = typename Parent::output_type;
            using output_type = std::tuple(), std::declval()))...>;
    
        public:
            /** There is SFINAE here because of something "kind of weird".
                Let's say you have to build something like Parent(std::move(parent)). Instead to try to reach the move constructor,
                it will try to instantiate this function with FsFWD = empty. However, Fs = {f1, f2, f3...};
                The SFINAE is to forbid this error, and so, Parent(parent) will reach the move constructor **/
            template 0)>>
            when_all(ParentFWD &&parent, FsFWD && ...f) : Parent(FWD(ParentFWD, parent)), Fs(FWD(FsFWD, f))... {}
    
            template
            decltype(auto) then(FFwd &&...ffwd) && {
                static_assert(sizeof...(FFwd) > 0, "Must have several functions objects");
                return make_when_all(std::move(*this), FWD(FFwd, ffwd)...);
            }
    
            decltype(auto) deferredExecutor() && {
                return make_result_getter(std::move(*this));
            }
    
            template
            decltype(auto) executeWaitAndGet(Scheduler &&s) && {
                auto f = deferredExecutor();
                f.execute(FWD(Scheduler, s));
                return f.waitAndGet();
            }
    
        private:
            Parent &getParent() {
                return static_cast(*this);
            }
    
            template
            void goToRootAndExecute(Scheduler &&s, Children&... children) & {
                // this is for ADL
                this->getParent().goToRootAndExecute(FWD(Scheduler, s), *this, children...);
            }
    
            template
            void execute(Scheduler &&s, input_type r, Child &directChild, GrandChildren &...grandChildren) & {
                auto exec = [&, r](auto i, auto &f) {
                    ::std::get < i >(mResult) = callWithoutNothingAsArgument(f, r);
    
                    if (mLeft.fetch_sub(1, std::memory_order::memory_order_release) == 1)
                        directChild.execute(FWD(Scheduler, s), std::move(mResult), grandChildren...);
                };
    
                auto executeOneFunction = [&s, &exec](auto i, auto f) {
                    if constexpr(i == sizeof...(Fs)-1)
                        exec(i, f);
    
                    else {
                        s([exec, i, f]() {
                            exec(i, f);
                        });
                    }
                };
    
                enumerate_args(executeOneFunction, static_cast(*this)...);
            }
    
        private:
            output_type mResult;
            movable_atomic_size_t mLeft{ sizeof...(Fs) };
        };
    
        template
        inline auto make_when_all(Parent &&parent, F&& ...f) {
            return when_all, std::decay_t...>{FWD(Parent, parent), FWD(F, f)...};
        }
    }
    
    template
    auto when_all(F&& ...f) {
        static_assert(sizeof...(F) > 0, "Must have several functions objects");
        return ::detail::make_when_all(detail::root{}, FWD(F, f)...);
    }

    This is the most difficult part.
    However, everything is not difficult. There are two things that are difficult. The output_type and the execute function.

    using input_type = typename Parent::output_type;
    using output_type = std::tuple<decltype(callWithoutNothingAsArgument(std::declval<Fs&>(), std::declval<input_type>()))...>;
    

    The input_type is straightforward, we take the output from the parent. The output_type is a bit tricky. The idea is to call all the function, and for each of them, add the result to a tuple.

    template
    void execute(Scheduler &&s, input_type r, Child &directChild, GrandChildren &...grandChildren) & {
        auto exec = [&, r](auto i, auto &f) {
            ::std::get < i >(mResult) = callWithoutNothingAsArgument(f, r);
    
            if (mLeft.fetch_sub(1, std::memory_order::memory_order_release) == 1)
                directChild.execute(FWD(Scheduler, s), std::move(mResult), grandChildren...);
        };
    
        auto executeOneFunction = [&s, &exec](auto i, auto f) {
            if constexpr(i == sizeof...(Fs)-1)
                exec(i, f);
    
            else {
                s([exec, i, f]() {
                    exec(i, f);
                });
            }
        };
        enumerate_args(executeOneFunction, static_cast(*this)...);
    }
    

    There are several things to think about.
    The exec lambda is the function that will be executed in a parallel thread. We store the value inside the good position of the tuple. If all functions have finished, we call the execute function for the directChild. This condition is checked by mLeft.

    The executeOneFunction lambda is the function that computes if the function must be launched through the scheduler or not. (If there is only one function, there is no need to launch it through the scheduler because the root already did that).

    The enumerate_args execute the function with each arguments, and give the index as an argument as well.

    The enumerate_args is following :

    #pragma once
    
    #include <type_traits>
    #include "fwd.h"
    
    namespace detail {
        template<typename F, std::size_t ...Is, typename ...Args>
        void enumerate_args_impl(F &&f, std::index_sequence<Is...>, Args && ...args) {
            using expander = int[];
            expander{ 0, ((void)f(std::integral_constant<std::size_t, Is>{}, FWD(Args, args)), 0)... };
        }
    }
    
    template<typename F, typename ...Args, typename Indices = std::index_sequence_for<Args...>>
    void enumerate_args(F &&f, Args &&...args) {
        detail::enumerate_args_impl(FWD(F, f), Indices{}, FWD(Args, args)...);
    }
    

    result_getter

    Once you are here, you may have already build your own result_getter

    #pragma once
    #include "bool_latch.h"
    #include "nothing.h"
    
    namespace detail {
    
        template<typename Parent>
        class result_getter : Parent {
            using input_type = typename Parent::output_type;
            using result_type = epurated_tuple_without_nothing_t<input_type>;
    
            template<typename, typename...>
            friend class when_all;
    
        public:
            template<typename ParentFWD>
            result_getter(ParentFWD &&parent) : Parent(std::move(parent)) {}
    
            template<typename Scheduler>
            void execute(Scheduler &&s) & {
                // this is for ADL
                this->getParent().goToRootAndExecute(FWD(Scheduler, s), *this);
            }
    
            result_type waitAndGet() & {
                mLatch.wait_for_job_done();
                return mResult;
            }
    
        private:
            Parent &getParent() {
                return static_cast<Parent&>(*this);
            }
    
            template<typename Scheduler>
            void execute(Scheduler &&, input_type r) & {
                auto setResult = [this](auto ...xs) {
                    if constexpr (sizeof...(xs) == 1)
                        mResult = result_type{ xs... };
    
                    else
                        mResult = std::make_tuple(xs...);
    
                    mLatch.job_done();
                };
    
                callWithoutNothingAsArgument(setResult, r);
            }
    
        private:
            result_type mResult;
            bool_latch mLatch;
        };
    
        // Ensure move semantic
        template<typename Parent, typename = std::enable_if_t<std::is_rvalue_reference_v<Parent&&>>>
        inline auto make_result_getter(Parent &&parent) {
            return result_getter<std::decay_t<Parent>>{std::move(parent)};
        }
    }
    

    The idea here is to execute the function and wait for the latch to return the result. All the part for epurated tuple is done here :

    namespace detail {
        template<typename...>
        struct tuple_without_nothing_impl;
    
        template<typename Tuple>
        struct tuple_without_nothing_impl<Tuple> {
            using type = Tuple;
        };
    
        template<typename ...PreviousArgs, typename T, typename ...NextArgs>
        struct tuple_without_nothing_impl<std::tuple<PreviousArgs...>, T, NextArgs...> {
            using type = std::conditional_t<
                std::is_same_v<T, nothing>,
                typename tuple_without_nothing_impl<std::tuple<PreviousArgs...>, NextArgs...>::type,
                typename tuple_without_nothing_impl<std::tuple<PreviousArgs..., T>, NextArgs...>::type>;
        };
    }
    
    template<typename Tuple>
    struct tuple_without_nothing;
    
    template<typename ...Args>
    struct tuple_without_nothing<std::tuple<Args...>> {
        using type = typename detail::tuple_without_nothing_impl<std::tuple<>, Args...>::type;
    };
    
    
    template<typename Tuple>
    using tuple_without_nothing_t = typename tuple_without_nothing<Tuple>::type;
    
    template<typename Tuple>
    struct epurated_tuple {
        using type = std::conditional_t<
            std::tuple_size_v<Tuple> == 1,
            std::tuple_element_t<0, Tuple>,
            tuple_without_nothing_t<Tuple>
        >;
    };
    
    template<>
    struct epurated_tuple<std::tuple<>> {
        using type = std::tuple<>;
    };
    
    template<typename Tuple>
    using epurated_tuple_without_nothing_t = typename epurated_tuple<tuple_without_nothing_t<Tuple>>::type;

    The idea is to remove the nothings from the tuple, and if the tuple owns only one type, the tuple is removed and we get only the type.

    Conclusion

    In this article, you learned how to write safe and easy to read multi-threaded functions. There is still a lot of things to do, but it is really usable and easy to use. And if someone else has to read your code, he should be able to do so.

    If you want to test it online, you can go on wandbox

    Reference

    Vittorio Romeo

  • Lava erupting from Vulkan : Initialization or Hello World

    Hi there !
    A Few weeks ago, February 16th to be precise, Vulkan, the new graphic API from Khronos was released. It is a new API which gives much more control about the GPUs than OpenGL (API I loved before Vulkan ^_^).

    OpenGL’s problems

    Driver Overhead

    Fast rendering problems could be from the driver, video games don’t use perfectly the GPU (maybe 80% instead of 95-100% of use). Driver overheads have big costs and more recent OpenGL version tend to solve this problem with Bindless Textures, multi draws, direct state access, etc.
    Keep in mind that each GPU calls could have a big cost.
    Cass Everitt, Tim Foley, John McDonald, Graham Sellers presented Approaching Zero Driver Overhead with OpenGL in 2014.

    Multi threading

    With OpenGL, it is not possible to have an efficient multi threading, because an OpenGL context is for one and only one thread that is why it is not so easy to make a draw call from another thread ^_^.

    Vulkan

    Vulkan is not really a low level API, but it provides a far better abstraction for moderns hardwares. Vulkan is more than AZDO, it is, as Graham Sellers said, PDCTZO (Pretty Darn Close To Zero Overhead).

    Series of articles about Lava

    What is Lava ?

    Lava is the name I gave to my new graphic (physics?) engine. It will let me learn how Vulkan work, play with it, implement some global illumination algorithms, and probably share with you my learnings and feelings about Vulkan. It is possible that I’ll make some mistakes, so, If I do, please let me know !

    Why Lava ?

    Vulkan makes me think about Volcano that make me think about Lava, so… I chose it 😀 .

    Initialization

    Now begins what I wanted to discuss, initialization of Vulkan.
    First of all, you have to really know and understand what you will attend to do. For the beginning, we are going to see how to have a simple pink window.

    Hello world with Vulkan
    Hello world with Vulkan

    When you are developing with Vulkan, I advise you to have specifications from Khronos on another window (or screen if you are using multiple screens).
    To have an easier way to manage windows, I am using GLFW 3.2, and yes, you are mandatory to compile it yourself ^_^, but it is not difficult at all, so it is not a big deal.

    Instance

    Contrary to OpenGL, in Vulkan, there is no global state, an instance could be similar to an OpenGL Context. An instance doesn’t know anything about other instances, is utterly isolate. The creation of an instance is really easy.

    Instance::Instance(unsigned int nExtensions, const char * const *extensions) {
        VkInstanceCreateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.pApplicationInfo = nullptr;
        info.enabledLayerCount = 0;
        info.ppEnabledLayerNames = nullptr;
        info.enabledExtensionCount = nExtensions;
        info.ppEnabledExtensionNames = extensions;
    
        vulkanCheckError(vkCreateInstance(&info, nullptr, &mInstance));
    }

    Physical devices, devices and queues

    From this Instance, you could retrieve all GPUs on your computer.
    You could create a connection between your application and the GPU you want using a VkDevice.
    Creating this connection, you have to create as well queues.
    Queues are used to perform tasks, you submit the task to a queue and it will be performed.
    The queues are separated between several families.
    A good way could be use several queues, for example, one for the physics and one for the graphics (or even 2 or three for this last).
    You could as well give a priority (between 0 and 1) to a queue. Thanks to that, if you consider a task not so important, you just have to give to the used queue a low priority :).

    Device::Device(const PhysicalDevices &physicalDevices, unsigned i, std::vector<float> const &priorities, unsigned nQueuePerFamily) {
        VkDeviceCreateInfo info;
        std::vector<VkDeviceQueueCreateInfo> infoQueue;
    
        mPhysicalDevice = physicalDevices[i];
    
        infoQueue.resize(physicalDevices.queueFamilyProperties(i).size());
    
        info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.queueCreateInfoCount = infoQueue.size();
        info.pQueueCreateInfos = &infoQueue[0];
        info.enabledExtensionCount = info.enabledLayerCount = 0;
        info.pEnabledFeatures = &physicalDevices.features(i);
    
        for(auto j(0u); j < infoQueue.size(); ++j) {
            infoQueue[j].sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
            infoQueue[j].pNext = nullptr;
            infoQueue[j].flags = 0;
            infoQueue[j].pQueuePriorities = &priorities[j];
            infoQueue[j].queueCount = std::min(nQueuePerFamily, physicalDevices.queueFamilyProperties(i)[j].queueCount);
            infoQueue[j].queueFamilyIndex = j;
        }
    
        vulkanCheckError(vkCreateDevice(physicalDevices[i], &info, nullptr, &mDevice));
    }
    

    Image, ImageViews and FrameBuffers

    The images represent a mono or multi dimensional array (1D, 2D or 3D).
    The images don’t give any get or set for data. If you want to use them in your application, then you must use ImageViews.

    ImageViews are directly relied to an image. The creation of an ImageView is not really complicated.

    ImageView::ImageView(Device &device, Image image, VkFormat format, VkImageViewType viewType, VkImageSubresourceRange const &subResourceRange) :
        mDevice(device), mImage(image) {
        VkImageViewCreateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.image = image;
        info.viewType = viewType;
        info.format = format;
        info.components.r = VK_COMPONENT_SWIZZLE_R;
        info.components.g = VK_COMPONENT_SWIZZLE_G;
        info.components.b = VK_COMPONENT_SWIZZLE_B;
        info.components.a = VK_COMPONENT_SWIZZLE_A;
        info.subresourceRange = subResourceRange;
    
        vulkanCheckError(vkCreateImageView(device, &info, nullptr, &mImageView));
    }

    You could write into ImageViews via FrameBuffers. A FrameBuffer owns multiple imageViews (attachments) and is used to write into them.

    FrameBuffer::FrameBuffer(Device &device, RenderPass &renderPass,
                             std::vector<ImageView> &&imageViews,
                             uint32_t width, uint32_t height, uint32_t layers)
        : mDevice(device), mRenderPass(renderPass),
          mImageViews(std::move(imageViews)),
          mWidth(width), mHeight(height), mLayers(layers){
        VkFramebufferCreateInfo info;
    
        std::vector<VkImageView> views(mImageViews.size());
    
        for(auto i(0u); i < views.size(); ++i)
            views[i] = mImageViews[i];
    
        info.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.renderPass = renderPass;
        info.attachmentCount = views.size();
        info.pAttachments = &views[0];
        info.width = width;
        info.height = height;
        info.layers = layers;
    
        vulkanCheckError(vkCreateFramebuffer(mDevice, &info, nullptr, &mFrameBuffer));
    }

    The way to render something

    A window is assigned to a Surface (VkSurfaceKHR). To draw something, you have to render into this surface via swapchains.

    From notions of Swapchains

    In Vulkan, you have to manage the double buffering by yourself via Swapchain. When you create a swapchain, you link it to a Surface and tell it how many images you need. For a double buffering, you need 2 images.

    Once the swapchain was created, you should retrieve images and create frame buffers using them.

    The steps to have a correct swapchain is :

    1. Create a Window
    2. Create a Surface assigned to this Window
    3. Create a Swapchain with several images assigned to this Surface
    4. Create FrameBuffers using all of these images.
    vulkanCheckError(glfwCreateWindowSurface(instance, mWindow, nullptr, &mSurface));
    
    void SurfaceWindow::createSwapchain() {
        VkSwapchainCreateInfoKHR info;
    
        uint32_t nFormat;
        vkGetPhysicalDeviceSurfaceFormatsKHR(mDevice, mSurface, &nFormat, nullptr);
        std::vector<VkSurfaceFormatKHR> formats(nFormat);
        vkGetPhysicalDeviceSurfaceFormatsKHR(mDevice, mSurface, &nFormat, &formats[0]);
    
        if(nFormat == 1 && formats[0].format == VK_FORMAT_UNDEFINED)
            formats[0].format = VK_FORMAT_B8G8R8A8_SRGB;
    
        mFormat = formats[0].format;
        mRenderPass = std::make_unique<RenderPass>(mDevice, mFormat);
    
        info.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
        info.pNext = nullptr;
        info.flags = 0;
        info.imageFormat = formats[0].format;
        info.imageColorSpace = formats[0].colorSpace;
        info.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
        info.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
        info.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
        info.compositeAlpha = VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR;
        info.presentMode = VK_PRESENT_MODE_MAILBOX_KHR;
        info.surface = mSurface;
        info.minImageCount = 2; // Double buffering...
        info.imageExtent.width = mWidth;
        info.imageExtent.height = mHeight;
    
        vulkanCheckError(vkCreateSwapchainKHR(mDevice, &info, nullptr, &mSwapchain));
        initFrameBuffers();
    }
    void SurfaceWindow::initFrameBuffers() {
        VkImage images[2];
        uint32_t nImg = 2;
    
        vkGetSwapchainImagesKHR(mDevice, mSwapchain, &nImg, images);
    
        for(auto i(0u); i < nImg; ++i) {
            std::vector<ImageView> allViews;
            allViews.emplace_back(mDevice, images[i], mFormat);
            mFrameBuffers[i] = std::make_unique<FrameBuffer>(mDevice, *mRenderPass, std::move(allViews), mWidth, mHeight, 1);
        }
    }

    Using swapchain is not difficult.

    1. Acquire the new image index
    2. Present queue
    void SurfaceWindow::begin() {
        // No checking because could be in lost state if change res
        vkAcquireNextImageKHR(mDevice, mSwapchain, UINT64_MAX, VK_NULL_HANDLE, VK_NULL_HANDLE, &mCurrentSwapImage);
    }
    
    void SurfaceWindow::end(Queue &queue) {
        VkPresentInfoKHR info;
    
        info.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
        info.pNext = nullptr;
        info.waitSemaphoreCount = 0;
        info.pWaitSemaphores = nullptr;
        info.swapchainCount = 1;
        info.pSwapchains = &mSwapchain;
        info.pImageIndices = &mCurrentSwapImage;
        info.pResults = nullptr;
    
        vkQueuePresentKHR(queue, &info);
    }

    To notions of Render Pass

    Right now, Vulkan should be initialized. To render something, we have to use render pass, and command buffer.

    Command Buffers

    Command buffer is quite similar to vertex array object (VAO) or display list (old old old OpenGL 😀 ).
    You begin the recorded state, you record some “information” and you end the recorded state.
    Command buffers are allocated from the CommandPool.

    Vulkan provides two types of Command Buffer.

    1. Primary level : They should be submitted within a queue.
    2. Secondary level : They should be executed by a primary level command buffer.
    std::size_t CommandPool::allocateCommandBuffer() {
        VkCommandBuffer cmd;
        VkCommandBufferAllocateInfo info;
    
        info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
        info.pNext = nullptr;
        info.commandPool = mCommandPool;
        info.commandBufferCount = 1;
        info.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    
        vulkanCheckError(vkAllocateCommandBuffers(mDevice, &info, &cmd));
    
        mCommandBuffers.emplace_back(cmd);
        return mCommandBuffers.size() - 1;
    }

    Renderpass

    One render pass is executed on one framebuffer. The creation is not easy at all. One render pass is componed with one or several subpasses.
    I remind that framebuffers could have several attachments.
    Each attachment are not mandatory to be used for all subpasses.

    This piece of code to create one renderpass is not definitive at all and will be changed as soon as possible ^^. But for our example, it is correct.

    RenderPass::RenderPass(Device &device, VkFormat format) :
        mDevice(device)
    {
        VkRenderPassCreateInfo info;
        VkAttachmentDescription attachmentDescription;
        VkSubpassDescription subpassDescription;
        VkAttachmentReference attachmentReference;
    
        attachmentReference.attachment = 0;
        attachmentReference.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    
        attachmentDescription.flags = VK_ATTACHMENT_DESCRIPTION_MAY_ALIAS_BIT;
        attachmentDescription.format = format;
        attachmentDescription.samples = VK_SAMPLE_COUNT_1_BIT;
        attachmentDescription.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
        attachmentDescription.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
        attachmentDescription.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
        attachmentDescription.stencilStoreOp = VK_ATTACHMENT_STORE_OP_STORE;
        attachmentDescription.initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
        attachmentDescription.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    
        subpassDescription.flags = 0;
        subpassDescription.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
        subpassDescription.inputAttachmentCount = 0;
        subpassDescription.colorAttachmentCount = 1;
        subpassDescription.pColorAttachments = &attachmentReference;
        subpassDescription.pResolveAttachments = nullptr;
        subpassDescription.pDepthStencilAttachment = nullptr;
        subpassDescription.preserveAttachmentCount = 0;
        subpassDescription.pPreserveAttachments = nullptr;
    
        info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
        info.pNext = nullptr;
        info.flags = 0;
        info.attachmentCount = 1;
        info.pAttachments = &attachmentDescription;
        info.subpassCount = 1;
        info.pSubpasses = &subpassDescription;
        info.dependencyCount = 0;
        info.pDependencies = nullptr;
    
        vulkanCheckError(vkCreateRenderPass(mDevice, &info, nullptr, &mRenderPass));
    }

    In the same way as for command buffer, render pass should be began and ended!

    void CommandPool::beginRenderPass(std::size_t index,
                                      FrameBuffer &frameBuffer,
                                      const std::vector<VkClearValue> &clearValues) {
        assert(index < mCommandBuffers.size());
        VkRenderPassBeginInfo info;
        VkRect2D area;
    
        area.offset = VkOffset2D{0, 0};
        area.extent = VkExtent2D{frameBuffer.width(), frameBuffer.height()};
    
        info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
        info.pNext = nullptr;
        info.renderPass = frameBuffer.renderPass();
        info.framebuffer = frameBuffer;
        info.renderArea = area;
        info.clearValueCount = clearValues.size();
        info.pClearValues = &clearValues[0];
    
        vkCmdBeginRenderPass(mCommandBuffers[index], &info, VK_SUBPASS_CONTENTS_INLINE);
    }
    

    Our engine in action

    Actually, our “engine” is not really usable ^^.
    But in the future, command pool, render pass should don’t appear in the user files !

    #include "System/contextinitializer.hpp"
    #include "System/Vulkan/instance.hpp"
    #include "System/Vulkan/physicaldevices.hpp"
    #include "System/Vulkan/device.hpp"
    #include "System/Vulkan/queue.hpp"
    #include "System/surfacewindow.hpp"
    #include "System/Vulkan/exception.hpp"
    #include "System/Vulkan/commandpool.hpp"
    #include "System/Vulkan/fence.hpp"
    
    void init(CommandPool &commandPool, SurfaceWindow &window) {
        commandPool.reset();
    
        VkClearValue value;
        value.color.float32[0] = 0.8;
        value.color.float32[1] = 0.2;
        value.color.float32[2] = 0.2;
        value.color.float32[3] = 1;
    
        for(int i = 0; i < 2; ++i) {
            commandPool.allocateCommandBuffer();
            commandPool.beginCommandBuffer(i);
            commandPool.beginRenderPass(i, window.frameBuffer(i), {value});
            commandPool.endRenderPass(i);
            commandPool.endCommandBuffer(i);
        }
        commandPool.allocateCommandBuffer();
    }
    
    void mainLoop(SurfaceWindow &window, Device &device, Queue &queue) {
        Fence fence(device, 1);
        CommandPool commandPool(device, 0);
    
        while(window.isRunning()) {
            window.updateEvent();
            if(window.neetToInit()) {
                init(commandPool, window);
                std::cout << "Initialisation" << std::endl;
                window.initDone();
            }
            window.begin();
            queue.submit(commandPool.commandBuffer(window.currentSwapImage()), 1, *fence.fence(0));
            fence.wait();
            window.end(queue);
        }
    }
    
    int main()
    {
        ContextInitializer context;
        Instance instance(context.extensionNumber(), context.extensions());
        PhysicalDevices physicalDevices(instance);
        Device device(physicalDevices, 0, {1.f}, 1);
        Queue queue(device, 0, 0);
    
        SurfaceWindow window(instance, device, 800, 600, "Lava");
    
        mainLoop(window, device, queue);
    
        glfwTerminate();
    
        return 0;
    }

    If you want the whole source code :
    GitHub

    Reference

    Approaching Zero Driver Overhead :Lecture
    Approaching Zero Driver Overhead : Slides
    Vulkan Overview 2015
    Vulkan in 30 minutes
    VkCube
    GLFW with Vulkan