Author: Antoine MORRIER

  • C++ Memory Safety: Safe Pointer

    C++ memory safety, the static way

    As time flies, C++ memory safety is more and more important. My last article was about dangling references. The problem is the check is done only at runtime. Today, we are going to see some obscure metaprogramming things related to friend injection. That will allow us to make safe containers.

    I want to warn people that this code is more a proof of concept than everything because as far as I recall, the standard committee wants to avoid making such things possible :).
    I also want to say, that even if the idea is quite old in my mind, doing it in an “elegant” way is inspired a lot by Mitzi.

    Here is the objective we want to reach. A safe pointer

        pointer<int> ptr{new int{5}};
        *ptr = 3;
        ptr.reset();
        //*ptr = 5; // not compile
    
        pointer<int> ptr2{nullptr};
        //*ptr2 = 18; // not compile

    Recall of Argument Dependant Lookup

    Argument Dependant Lookup allows people to use functions declared in a namespace without knowing they are in such a namespace. An example could be the Hello World example std::cout << "Hello World". The ostream &operator<<(ostream &, const char*); function is defined within the std namespace, but since cout is one type declared within the std namespace, the compiler can find the operator easily through ADL.

    A second example could be :

    namespace A {
        struct B{};
        void f(B) {}
    }
    
    f(A::B{}); // unqualified call => found through ADL
    A::f(A::B{}); // qualified call

    Recall of friend injection

    Friend injection is well described on the internet, but basically, friend can be used for two things :

    1. Access private or protected members of a class
    2. Inject a function into the innermost namespace surrounding the class so that this function is reachable from Argument Dependant Lookup (ADL)

    The second point is, I think, the less known fact.

    namespace A {
        struct B {
            friend void f(B) {}
        };
    }
    
    A::f(A::B{}); // Does not compile
    f(A::B{}); // Only way to get it is through ADL

    Since they are not function class member but real functions, you can also declare them in one structure and define them in another one.

    struct A {
        friend void f(A);
    }
    
    struct B {
        friend void f(A) {};
    };
    
    f(A{}); // Call the definition in B

    Check if a function is defined within a template context

    When a template instantiation occurs for a class, all the non-template friend functions get instantiated as well.

    To know if a function is defined, you can use a requires clause with a not-defined function that returns auto.

    auto f(int); // Declare your function
    
    template<int X> // Define the function when one template is instantiated
    struct DefineTheFunction{
        friend auto f(int) {}
    };
    
    template<auto X = 0, bool B = requires{f(X);}>
    constexpr auto g() {
        return B; // return true if f is defined
    };
    
    // DefineTheFunction<0> _;
    
    int main() {
        static_assert(g() == false);
    }

    If you uncomment the line instantiating the template, the static_assert will fire.

    What is even more amazing is that if you instantiate the template between two calls to g(), you will get two different results!

    int main() {
        static_assert(g() == false);
        DefineTheFunction<0> _;
        static_assert(g() == true);
    }

    I think it is as atrocious as beautiful :D.

    Declaring a Context

    To make our objective work as expected, it is wanted that the two pointers are not exactly of the same type. Also, the two calls to the g() functions we had before are not the same ones.

    Since C++20, we can declare such context using auto Context = []{}.

    template<typename T, auto Context = []{}>
    struct pointer {
        // black magic
    };
    
    pointer<int> p1; // pointer<int, Context1>
    pointer<int> p2; // pointer<int, Context2>

    The Context associated with the pointer will get modified each time you call a function of this pointer. The other one will not be impacted because they are two different types.

    Be able to produce more than two values

    The idea here is simple, you create a Type<0>. If the function for this type is defined, let’s try the function for Type<1> and so on. When you get an undefined function, it means you need to instantiate it to generate a new value. You can implement a Counter with such an approach.

    template<int Value>
    struct InjectedValue {
        static constexpr int value = Value;
        constexpr operator int() const noexcept { return Value; }
        friend constexpr auto injected(InjectedValue<Value>);
    };
    
    template<auto evaluation, int Value = 0>
    constexpr auto getNextInjectedValue() {
        constexpr auto injectedValue = InjectedValue<Value>{}; 
        constexpr bool isInjected = requires {injected(injectedValue);};
        
        if constexpr (isInjected) {
            return getNextInjectedValue<evaluation, Value + 1>();
        }
        else {
            return injectedValue;
        }
    }

    Here the idea is simple, we create an InjectedValue<int>. It has a declaration injected function, not the definition. We define a getNextInjectedValue which will iterate until it does not find the definition for the given function. It returns the value for which there is no injected definition available. The evaluation is needed here, or else it will return always the same value because of memoization. Let’s implement a Counter now :).

    template<int Value>
    struct CounterInjector {
        friend constexpr auto injected(InjectedValue<Value>) {}
    };
    
    struct Counter {
        template<auto evaluation = []{}>
        static constexpr int next() {
            constexpr auto toInject = getNextInjectedValue<evaluation>();
            CounterInjector<toInject> _{};
            return toInject;
        }
    };

    We create a CounterInjector which injects the definition. The next function is easy, we get the next value to inject representing the value of the current counter. We then inject a definition for this value (then the next call will return value + 1). And we return the current value. The evaluation here is to have a different definition of next() function at each call. Now let’s test it!

    int main() {
        static_assert(Counter::next() == 0);
        static_assert(Counter::next() == 1);
        static_assert(Counter::next() == 2);
    }

    Amazing !

    The problem is we can have only one counter. Let’s add a Context template parameter for each object!

    template<int Value, auto Context>
    struct InjectedValue {
        static constexpr int value = Value;
        constexpr operator int() const noexcept { return Value; }
        friend constexpr auto injected(InjectedValue<Value, Context>);
    };
    
    template<auto evaluation, auto Context, int Value = 0>
    constexpr auto getNextInjectedValue() {
        constexpr auto injectedValue = InjectedValue<Value, Context>{}; 
        constexpr bool isInjected = requires {injected(injectedValue);};
        
        if constexpr (isInjected) {
            return getNextInjectedValue<evaluation, Context, Value + 1>();
        }
        else {
            return injectedValue;
        }
    }
    
    template<int Value, auto Context>
    struct CounterInjector {
        friend constexpr auto injected(InjectedValue<Value, Context>) {}
    };
    
    template<auto Context = []{}>
    struct Counter {
        template<auto evaluation = []{}>
        static constexpr int next() {
            constexpr auto toInject = getNextInjectedValue<evaluation, Context>();
            CounterInjector<toInject, Context> _{};
            return toInject;
        }
    };

    And the testing:

    int main() {
        using C1 = Counter<>;
        using C2 = Counter<>;
        static_assert(C1::next() == 0);
        static_assert(C1::next() == 1);
        static_assert(C1::next() == 2);
    
        static_assert(C2::next() == 0);
        static_assert(C2::next() == 1);
        static_assert(C2::next() == 2);
    }

    Perfect, we reach our goal.

    Let’s write the safe_pointer<T>

    The first step for C++ memory safety is to avoid dangling references. Pointers are one of the biggest sources of bugs in C++. For this pointer, I propose 3 states.

    1. Initialized: we know at the compilation that the pointer is initialized
    2. Destroyed: we know at the compilation that the pointer is not initialized
    3. Unknown: we don’t know, at the compile time, if the pointer is initialized or not.

    For the sake of simplicity, I propose to don’t care about the third case for now.

    struct InitializedPointer {};
    struct NullPointer {};
    
    template<typename T, auto Context = []{}>
    struct safe_pointer {
    public:
        safe_pointer() : m_ptr{nullptr} {
            // set state to NullPointer
        }
        safe_pointer(decltype(nullptr)) : m_ptr{nullptr}{
            // set state to NullPointer
        }
        safe_pointer(T *ptr) : m_ptr{ptr} {
            // set state to InitializedPointer
        }
        ~safe_pointer() { delete m_ptr; }
    
        template<auto evaluation = []{}>
        T &operator*() {
            // static_assert(state is InitializedPointer)
            return *m_ptr;
        }
    
        template<auto evaluation = []{}>
        void reset() {
            // set state to NullPointer
            delete m_ptr;
            m_ptr = nullptr;
        }
        
    private:
        T *m_ptr;
    };

    Once we got here, we understand that we need a State object that is modifiable at compile time.

    Let’s design it !

    template<typename T>
    struct State {
        using type = T;
    };
    
    template<auto evaluation, auto Context>
    constexpr auto getLastInjectedValue() {
        constexpr auto nextValueToInject = getNextInjectedValue<evaluation, Context>();
        return InjectedValue<nextValueToInject - 1, Context>{};
    }
    
    template<typename T, int Value, auto Context>
    struct StateInjector {
        friend constexpr auto injected(InjectedValue<Value, Context>) {
            return State<T>{};
        }
    };
    
    template<typename First, auto Context = []{}>
    struct MetaState {
        static constexpr auto context = Context;
        static constexpr auto first = StateInjector<First, 0, context>{};
    
        template<auto evaluation = []{}>
        using get = typename decltype(injected(getLastInjectedValue<evaluation, context>()))::type;
    
        template<typename T, auto evaluation = []{}>
        static constexpr auto set() {
            constexpr auto toInject = getNextInjectedValue<evaluation, context>();
            return StateInjector<T, toInject, context> {};
        }
    };

    We begin to create a State object that just owns a type. After we create a function returning the last injected value. We introduced a StateInjector. The function has a definition, but what is new is that it returns the T wrapped-over State<T>. It will allow client calls to get the type through ADL! It’s a little black magic!

    The MetaState<First> proposed 2 functions, get who returns the latest set value, and set injects into the Context another type.

    With a little test:

        using MS = MetaState<int>;
    
        static_assert(std::is_same_v<MS::get<>, int>);
        MS::set<double>();    
        static_assert(std::is_same_v<MS::get<>, double>);
        MS::set<char>();
        static_assert(std::is_same_v<MS::get<>, char>);

    Let’s complete the safe_pointer now !

    template<typename T, auto Context = []{}>
    struct safe_pointer {
        using state = MetaState<State<void>, Context>;
    public:
        safe_pointer() : m_ptr{nullptr} {
            state::template set<NullPointer>();
        }
        safe_pointer(decltype(nullptr)) : m_ptr{nullptr}{
            state::template set<NullPointer>();
        }
        safe_pointer(T *ptr) : m_ptr{ptr} {
            state::template set<InitializedPointer>();
        }
        ~safe_pointer() { delete m_ptr; }
    
        template<auto evaluation = []{}>
        T &operator*() {
            using current = state::template get<>;
            static_assert(std::is_same_v<current, InitializedPointer>);
            return *m_ptr;
        }
    
        template<auto evaluation = []{}>
        void reset() {
            state::template set<NullPointer>();
            delete m_ptr;
            m_ptr = nullptr;
        }
        
    private:
        T *m_ptr;
    };

    And the little test as usual !

    int main() {    
        safe_pointer<int> p1{new int};
    
        *p1 = 41;
        p1.reset();
        *p1 = 53; // Don't compile
        
        safe_pointer<int> p2{nullptr};
        safe_pointer<int> p3;
    
        *p2 = 20; // don't compile
        *p3 = 43; // don't compile
    }

    Et voilà!

    Here is the full link implementation: Full Implementation on wandbox

    Conclusion

    I hope this article pleased you. If you are interested, we can see more in detail some other things like safe_optional, safe_vector, safe_ref.
    We can also see if we are able to manage ref counting with exclusive access.

    But be aware that these techniques may not be suitable for production code ;).

    See you!
    Thanks to Patrice Espie for the review :).

  • An attempt to remove dangling pointers in C++

    Have you ever had any dangling pointers or references in your application? If so, this article will open a discussion about how to try to remove them from your application.

    A bit of Context

    As many of you may have heard, during the last months there were some discussions about memory-safe languages and governmental organizations like the NSA or the White House.
    I can understand that memory-safe languages are well appreciated by such organizations. I have used a bit the Rust programming language, and yes, it is pleasing to use from a developer’s point of view because a lot of memory flaws are caught directly by the compiler. However, I can also understand that industries don’t want to rewrite all their software from C++ to another language, whether it is Rust, Java, C#, or even force their employees to learn a new language. If they decide to do either option, they will lose productivity and can lose market shares.

    Dangling pointers

    First, what are dangling pointers? It is a pointer to an invalid memory location and can result in a use-after-free problem for example. It is this specific problem I want to discuss with this article.

    int *p_a = nullptr;
    {
        int a = 10;
        p_a = &a;
    }
    *p_a = 12; // a is already destroyed

    There is already one well-known solution for this problem: the weak reference. In C++, it is expressed as the pair: shared_ptr and weak_ptr.
    The idea is simple: when the owners (shared_ptr) are all destroyed, the weak references (weak_ptr) do not point anymore to the destroyed object, but to a null pointer.

    It is a very nice and useful pattern. However, sometimes, we consider that the weak_ptr is always valid if we reach that code and we don’t check if the object is still valid.

    std::shared_ptr<int> sp = ...;
    std::weak_ptr<int> wp = sp;
    
    ...
    
    if(auto sp2 = wp.lock()) {
      use(*sp2); // safe because of the test
    }
    std::shared_ptr<int> sp = ...;
    std::weak_ptr<int> wp = sp;
    
    ...
    
    use(*wp.lock()); // fail if sp deleted

    Since we are all humans, it can happen that, even if we were sure that there is no problem, a problem may arise and boum, a vulnerability can be exploited.

    Introducing not_dangling and ref objects.

    The idea I want to share is to make it impossible to have weak pointers on a resource you are going to destroy. To be simple, when the object is destroyed, if one or several references are pointing to it, we just call std::terminate. Unfortunately, unlike in Rust, we don’t have the possibility, without doing ugly stateful metaprogramming, to catch such errors at compile time :/.

    Here is what I propose to avoid this kind of dangling reference (Obviously, the code is as simple as possible and not intended to be used in production).

    class not_dangling {
        template<typename>
        friend class ref;
    public:
        ~not_dangling() noexcept(false) {
            if(m_reference_count.load(std::memory_order_relaxed))
                std::terminate();
        }
    
    private:
        mutable std::atomic_int m_reference_count{0};
    };
    
    template<typename T>
    class ref {
        template<typename>
        friend class ref;
    public:
        ref(T &object) noexcept : m_object{object} {
            m_object.m_reference_count.fetch_add(1, std::memory_order::memory_order_relaxed);
        }
    
        template<typename U>
        ref(const ref<U> &ref) noexcept : m_object{static_cast<T&>(ref.m_object)}{
            m_object.m_reference_count.fetch_add(1, std::memory_order::memory_order_relaxed);
        }
    
        ref(const ref &ref) noexcept : m_object{ref.m_object} {
            m_object.m_reference_count.fetch_add(1, std::memory_order::memory_order_relaxed);
        }
    
        T &operator*() { return m_object; }
        T *operator->() { return std::addressof(m_object); }
    
        ~ref() {
            m_object.m_reference_count.fetch_add(-1, std::memory_order::memory_order_relaxed);
        }
    private:
        T &m_object;
    };
    struct Object : public not_dangling {
    
    };
    
    struct Derived : public Object {
    
    };
    
    int main()
    {
        std::optional<Derived> a;
        a.emplace();
    
        ref<Object> ref_base(*a);
        ref<const Object> ref_base_2(ref_base);
        ref<const Derived> ref_derived(*a);
        ref<const Derived> ref_derived_2(ref_derived);
    
        return 0;
    }

    Unfortunately, we must rely on a wrapper to do that, so it is not a “plug-and-play” solution that you can attach to your code directly. The cond’t are:

    • Disable triviality of trivial types because of user-defined destructor
    • May have a little performance overhead
    • It is intrusive: you must derive your object from it…

    The first point is, from my experience, not a real problem. I never encountered to have a reference on a type that I needed trivial. If it was, it’s good, but if it was not, I guess I would have never been aware of that :).

    The second point is fair, however, even if the performance impact is negligible, sometimes it can be too much. However, we can easily make it an alias. So when your QA, and other developers work, you use the objects we just discussed about, and when you make the build for the commercial version, just use the passthrough alias.

    The latest point is because of inheritance. If you have a better idea, please tell me :).

    template<typename T>
    using not_dangling = std::conditional_t<EnableNotDanling, not_dangling_base, empty_class>;
    
    template<typename T>
    using ref = std::conditional_t<EnableNotDangling, ref_wrapper<T>, T*>;

    Conclusion

    Did you have already faced problems with dangling pointers? If so, what do you think about such an approach? Will you envisage using such objects within your code base, or at least for debugging?

    Thanks for reading,

  • Stop using bool in C++ for function parameters !

    Introduction

    This article deals with the use of bool in C++. Should we use it or not? That is the question we will try to answer here. However, this is more of an open discussion than a coding rule.

    First of all, what is the bool type? A boolean variable is a variable that can be set to false or true.

    Imagine you have a simple function to decide whether or not to buy an house, you may design it like this

    bool shouldBuyHouse(bool hasSwimmingPool, bool hasEconomicLight);

    Problems arrived!

    Then, when you want to use it, you can do it like this:

    if(shouldBuyHouse(false, true)){}

    There is no problem here, however the reader may not understand at first glance if the false means no pools, or if it means no energy saving lights.

    So, you will try to change the function call like this:

    bool economicLight = true;
    bool hasSwimmingPool = false;
    
    if(shouldBuyHouse(economicLight, hasSwimmingPool)) {
    
    }

    Now you are happy, the reader knows exactly what bool means. Are you sure? A very thorough reader may notice that there is a reversal of the parameters.

    How to solve the problem?

    There is different ways to solve this type of problem. The first is to use a strong_type. There are many libraries that offer this kind of thing, however, a simple enum class can do the trick.

    Not only will the reader know which argument corresponds to which parameter, but also, in the case of a parameter inversion, the compiler will not let the error pass

    Let’s rewrite the function declaration:

    enum class HouseWithSwimmingPool {No, Yes};
    enum class HouseWithLights {Economical, Incandescent};
    
    bool shouldBuyHouse(HouseWithSwimmingPool, HouseWithLights);
    
    if(shouldBuyHouse(HouseWithSwimmingPool::Yes, HouseWithLights::Economical)) {
    
    }

    Conclusion

    I would encourage people not to use the bool type for function parameters. What do you think? Do you use bool everywhere?

    Thanks for reading !