Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method.
Rendezvous hashing solves the distributed hash table problem: How can a set of clients, given an object O, agree on where in a set of n sites (servers, say) to place O? Each client is to select a site independently, but all clients must end up picking the same site. This is non-trivial if we add a minimal disruption constraint, and require that only objects mapping to a removed site may be reassigned to other sites.
The basic idea is to give each site Sj a score (a weight) for each object Oi, and assign the object to the highest scoring site. All clients first agree on a hash function h(). For object Oi, the site Sj is defined to have weight wi,j = h(Oi, Sj). HRW assigns Oi to the site Sm whose weight wi,mis the largest. Since h() is agreed upon, each client can independently compute the weights wi,1, wi,2, ..., wi,n and pick the largest. If the goal is distributed k-agreement, the clients can independently pick the sites with the k largest hash values.
If a site S is added or removed, only the objects mapping to S are remapped to different sites, satisfying the minimal disruption constraint above. The HRW assignment can be computed independently by any client, since it depends only on the identifiers for the set of sites S1, S2, ..., Sn and the object being assigned.
HRW easily accommodates different capacities among sites. If site Sk has twice the capacity of the other sites, we simply represent Sk twice in the list, say, as Sk,1 and Sk,2. Clearly, twice as many objects will now map to Sk as to the other sites.
Under rendezvous hashing, however, clients handle site failures by picking the site that yields the next largest weight. Remapping is required only for objects currently mapped to the failed site, and as proved in, disruption is minimal. Rendezvous hashing has the following properties.
- Low overhead: The hash function used is efficient, so overhead at the clients is very low.
- Load balancing: Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites.
- Site capacity: Sites with different capacities can be represented in the site list with multiplicity in proportion to capacity. A site with twice the capacity of the other sites will be represented twice in the list, while every other site is represented once.
- High hit rate: Since all clients agree on placing an object O into the same site SO , each fetch or placement of O into SO yields the maximum utility in terms of hit rate. The object O will always be found unless it is evicted by some replacement algorithm at SO .
- Minimal disruption: When a site fails, only the objects mapped to that site need to be remapped. Disruption is at the minimal possible level.
- Distributed k-agreement: Clients can reach distributed agreement on k sites simply by selecting the top k sites in the ordering.
Comparison with Consistent Hashing
- Rendezvous hashing is much simpler to understand and code.
- Rendezvous hashing provides a very even distribution of keys on each node, even while node are being added/removed. Consistent hashing can fail to provide an even distribution for small clusters (though this can be fixed to a large extent by using many virtual replicas for each node). This is the biggest advantage of Rendezvous hashing over consistent hashing.
- Consistent hashing is typically done in O(logN)O(logN) time using a binary search. Rendezvous hashing is typically done in O(N)O(N)time, though it can
- Consistent hashing requires just one hash computation per key, whereas Rendezvous hashing requires O(N)O(N) hash computations per key. This can make a difference if you're using a slow hash function and have a large ring size.
- Consistent hashing requires some fixed memory to work well (mapping nodes to virtual nodes and hashes for all the virtual nodes) whereas Rendezvous hashing doesn't require storing any additional data.
- Rendezvous hashing can naturally provide kk different servers for any key. This makes it very useful to support replication. While consistent hashing can also be modified to do this, it's not a 'standard part' of consistent hashing algorithm/implementation.
So in nutshell, use Rendezvous hashing if:
- Your clusters are very small.
- Your clusters are very large (say thousands of nodes) and you need to keep your memory footprint low.
- You want to support replication, but don't want to implement a slightly modified consistent hashing algorithm yourself.
Comparison with Consistent Hashing
Apache Ignite uses Rendezvous Hashing to distribute cache data uniformaly in the computing grid. Cassandra uses Consistent Hashing for replication and high availability.