Simplify hashing in C++

This article will focus mostly on how to simplify hashing in C++. Hashing is a major feature that allows us to use unordered_map, which is the C++ implementation of a HashMap. If you want to hash a value in C++, you can use the std::hash specialization for the type you want to hash.

int main() {
    std::cout << std::hash<std::string>{}("My name is Antoine") << std::endl;
}

If you want to make a user-defined type to be hashable, it’s also possible through template specialization of std::hash:

struct Person {
    std::string name;
};

template<>
struct std::hash<Person> {
    std::size_t operator()(const Person& person) const noexcept {
        return std::hash<std::string>{}(person.name);
    }
};

That’s nice and all, however, we can easily detect three caveats here:

  1. The syntax with std::hash is painful. Something close to hash(a) should be a better approach
  2. The syntax to std::hash a user-defined type is even more painful. Adding a simple member hash function (or a free function) should be the way to go.
  3. It’s not possible to std::hash easily multiple values, either directly(e.g hash(a, b), or indirectly (e.g hash(vector), hash(tuple)).

We are going to see how to simplify this step-by-step. The final objective would be to be able to write something close to:

class Person {
public:
    std::size_t hash() const noexcept { return ::hash(m_age, m_name); }
private:
    int m_age;
    std::string m_name;
};

Here, Person becomes magically hashable and can be used directly in an unordered_map or an unordered_set

Simplify hashing in C++: syntax

We want to be able to write something close to: hash(value). To do that, we need to do different things:

  1. We need to know if a type is hashable through std::hash
  2. We need to create a function forwarding the value to the good specialization
// library side
template<typename T>
concept is_std_hashable = requires(const T& object) {
    {std::hash<T>{}(object)} -> std::convertible_to<std::size_t>;
};

template<typename T>
std::size_t hash(const T& object) {
    static_assert(is_std_hashable<T>);
    return std::hash<T>{}(object);
}

// User side
int main() {
    std::cout << hash("My name is Antoine"sv) << std::endl;
}

The first step has been resolved successfully.

Simplify the hashing of a user-defined type

What we want to do here is to make a type hashable only by adding a member function hash returning a std::size_t

Again, to be able to do that, we need to follow several steps:

  1. Adding a concept has_member_hash_function
  2. Adding a concept hashable
  3. Rewrite the hash function
// library side
template<typename T>
concept has_hash_member_function = requires(const T& object) {
    {object.hash()} -> std::convertible_to<std::size_t>;
};

template<typename T>
concept hashable = is_std_hashable<T> || has_hash_member_function<T>;

template<hashable T>
std::size_t hash(const T& object) {
    if constexpr(is_std_hashable<T>)
        return std::hash<T>{}(object);
    else
        return object.hash();
}

// user side
struct Person {
    bool operator==(const Person&) const = default;
    std::size_t hash() const noexcept { return ::hash(name); }
    std::string name;
    int age;
};

int main() {
    Person person{"Antoine", 33};
    std::cout << hash(person) << std::endl;
}

Adding specialization automatically for std::hash.

Everything is fine; however, even if it is straightforward, we cannot yet have std::unordered_set<Person> it because there is no template specialization. Let’s really simplify hashing in C++!

  1. We need to add a generic hash specialization handling the hash member function. We can also restrict it to the types which are not is_std_hashable, but usually, the specialization of a concrete type being a better match than the one with concept, it should not be necessary.
template<has_hash_member_function T>
struct std::hash<T>{
    std::size_t operator()(const T& object) const noexcept {
        return object.hash();
    }
};

// user side
struct Person {
    bool operator==(const Person&) const = default;
    std::size_t hash() const noexcept { return ::hash(name); }
    std::string name;
    int age;
};

int main() {
    std::unordered_set<Person> persons;
    persons.emplace("Antoine", 33);
    persons.emplace("Cedric", 30);
    persons.emplace("Antoine", 30);
    persons.emplace("Vasyl", 29);

    for(const auto &person: persons)
    {
        std::println("{}: {}", person.name, person.age);
    }
}

Simplify hashing in C++: Adding combination

A simple way to combine hashes is to use the same function as the one in boost. It’s not optimal, but you can easily find a better implementation with better distribution. See the reference part for that

template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
    std::hash<T> hasher;
    seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

So, the idea here is that from several values, you hash them using this function:

  1. Adding this function and call hash instead of std::hash<T>.
  2. Adding an overload to hash function taking several arguments and calls hash_combine.
template <class T>
void hash_combine(std::size_t& seed, const T& v)
{
    seed ^= hash(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

template<hashable ...Ts>
std::size_t hash(const Ts& ...objects)
{
    std::size_t result = 0;
    (hash_combine(result, objects), ...);
    return result;
}

// user-side
struct Person {
    bool operator==(const Person&) const = default;
    std::size_t hash() const noexcept { return ::hash(name, age); }
    std::string name;
    int age;
};

We can further optimize by initializing the result to the hash of the first object.

Simplify hashing in C++ of a range of values

Now, we want to be able to hash a range of values, like std::vector<Person>.

  1. We need to know if a type is an iterable
  2. We need to add a specialization of std::hash for range values
template<typename T>
concept is_iterable = requires(const T& object) {
    object.begin();
    object.end();
};

template<is_iterable Iterable>
struct std::hash<Iterable> {
    std::size_t operator()(const Iterable& range) const noexcept {
        std::size_t result = 0;
        for(const auto& object: range)
            hash_combine(result, object);
        return result;
    }
};

Adding specialization for tuple and pair

Again, here is the list of what we need to do:

  1. Add a pair specialization for std::hash calling hash(p.first, p.second)
  2. Add a tuple specialization for std::hash calling hash on each object.
template<typename T1, typename T2>
struct std::hash<std::pair<T1, T2>> {
    std::size_t operator()(const std::pair<T1, T2>& pair) const noexcept {
        return ::hash(pair.first, pair.second);
    }
};

template<typename ...Ts>
struct std::hash<std::tuple<Ts...>> {
    std::size_t operator()(const std::tuple<Ts...>& tuple) const noexcept {
        auto to_apply = [](const auto& ...objects) { return ::hash(objects...); };
        return std::apply(to_apply, tuple);
    }
};

Conclusion

I hope you liked the article and learnt something. Here is the complete code of our library 😀 .

// Library side
template<typename T>
concept is_std_hashable = requires(const T& object) {
    {std::hash<T>{}(object)} -> std::convertible_to<std::size_t>;
};

template<typename T>
concept has_hash_member_function = requires(const T& object) {
    {object.hash()} -> std::convertible_to<std::size_t>;
};

template<typename T>
concept hashable = is_std_hashable<T> || has_hash_member_function<T>;

template<typename T>
concept is_iterable = requires(const T& object) {
    object.begin();
    object.end();
};

template<hashable T>
std::size_t hash(const T& object) {
    if constexpr(is_std_hashable<T>)
        return std::hash<T>{}(object);
    else
        return object.hash();
}

template <class T>
void hash_combine(std::size_t& seed, const T& v)
{
    seed ^= hash(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

template<hashable ...Ts>
std::size_t hash(const Ts& ...objects)
{
    std::size_t result = 0;
    (hash_combine(result, objects), ...);
    return result;
}

template<has_hash_member_function T>
struct std::hash<T>{
    std::size_t operator()(const T& object) const noexcept {
        return object.hash();
    }
};

template<is_iterable Iterable>
struct std::hash<Iterable> {
    std::size_t operator()(const Iterable& range) const noexcept {
        std::size_t result = 0;
        for(const auto& object: range)
            hash_combine(result, object);
        return result;
    }
};

template<typename T1, typename T2>
struct std::hash<std::pair<T1, T2>> {
    std::size_t operator()(const std::pair<T1, T2>& pair) const noexcept {
        return ::hash(pair.first, pair.second);
    }
};

template<typename ...Ts>
struct std::hash<std::tuple<Ts...>> {
    std::size_t operator()(const std::tuple<Ts...>& tuple) const noexcept {
        auto to_apply = [](const auto& ...objects) { return ::hash(objects...); };
        return std::apply(to_apply, tuple);
    }
};

If you want to see it in action: Godbolt link

Reference

  1. Simple hash_combine
  2. Better implementation of hash_combine

Comments

Leave a Reply