Consistent Hashing in Distributed Systems

Table of Contents

Load Balancing is a key concept in distributed systems, it is one of the challenging problem to make sure there is equal distribution, hashing is important to achieve this and consistent hashing is one of the solid ways to do it


What is Hashing?

Hashing is the process of mapping one piece of data. A function is usually used for mapping objects to hash code known as a hash function.

For example, a hash function can be used to map random size strings to some fixed number between 0 … N. Like

“abc” -> 123

Load Balancing

Load balancing is about distributing a set of tasks over a set of resources. For example, we use load balancing to distribute the API requests to web server instances.

Mod-n Hashing

In this technique, each key is hashed using a hashing function to transform an input into an integer. Then, we perform a modulo based on the number of nodes.

mod-n-hashing

The benefit of this approach is its statelessness. We don’t have to keep any state to remind us that request2 was routed to node 1. But things would change we add or remove a node.

mod-n-hashing-conclusion

This results in lots of performance issues, because now requests will move to other nodes and there will be lots of caches miss, to handle this we use consistent hashing.

Consistent Hashing

Its a technique, released in a paper by Karger in 1977 that works on fixed keys-slots ring algorithm to find the node, in the following example its just 8 portions ring, in real world example it could be a lot bigger.

consistent-hashing-ring

 We map our nodes/server on this ring using a hash function, in the following case, there are three nodes.

consistent-hashing-ring-nodes

 When it receives the request, it gets the hash of the requests and checks the ring clock-wise and routes the request to first node it finds in the range. Lets take example of three requests

f(request1) = 1

f(request2) = 4

f(request3) = 6

so these requests will be handled as following

consistent-hashing-ring-nodes-requests

Request1 will be handled by N2, request2 by N3 and request3 by N1, so far so good now lets suppose we need to add a new node between N2 and N3 to handle more requests.

consistent-hashing-ring-nodes-requests-conclusion

 Now request2 will route to N4 instead of N3, rest is all unchanged, which is great but you will analyze that traffic won’t be evenly distributed to all nodes another case would be if N2 is crashed or removed, then all requests will route to N4 which will eventually overload N4, so solve this virtual nodes are created and multiple hashing functions are used to get different indices for nodes as following

consistent-hashing-ring-final-conclusion

 In a excellent consistent hashing algorithm only 1/n keys are mapped again where n is number of nodes. O(1) to add or remove nodes.

Command PATH Security in Go

Command PATH Security in Go

In the realm of software development, security is paramount. Whether you’re building a small utility or a large-scale application, ensuring that your code is robust

Read More »
Undefined vs Null in JavaScript

Undefined vs Null in JavaScript

JavaScript, as a dynamically-typed language, provides two distinct primitive values to represent the absence of a meaningful value: undefined and null. Although they might seem

Read More »