Suppose you are asked to design a unique ID generator in distributed systems. Your first thought might be to use a primary key with the auto_increment attribute in a traditional database.
However, auto_increment does not work in a distributed environment because a single database server is not large enough and generating unique IDs across multiple databases with minimal delay is challenging.
Here are a few examples of unique IDs:
The requirements are listed as follows:
Multiple options can be used to generate unique IDs in distributed systems. The options we considered are:
Let us look at each of them, how they work, and the pros/cons of each option.
As shown in the following figure, the first approach is multi-master replication.
This approach uses the databases’ auto_increment feature. Instead of increasing the next ID by 1, we increase it by k, where k is the number of database servers in use.
As illustrated in the above figure, the following ID to be generated is equal to the previous ID in the same server plus 2. This solves some scalability issues because IDs can scale with the number of database servers.
However, this strategy has some significant drawbacks:
A UUID is another easy way to obtain unique IDs. UUID is a 128-bit number used to identify information in computer systems. UUID has a very low probability of getting collision.
Quoted from Wikipedia, “after generating 1 billion UUIDs every second for approximately 100 years would the probability of creating a single duplicate reach 50%”.
Here is an example of UUID: 09c93e62-50b4-468d-bf8a-c07e1040bfb2. UUIDs can be generated independently without coordination between servers. The following figure presents the UUIDs design.
In this design, each web server contains an ID generator, and a web server is responsible for generating IDs independently.
Pros:
Cons:
Ticket servers are another interesting way to generate unique IDs. Flicker developed ticket servers to generate distributed primary keys. It is worth mentioning how the system works.
The idea is to use a centralized auto_increment feature in a single database server (Ticket Server). To learn more about this, refer to flicker’s engineering blog article.
Pros:
Cons:
The approaches mentioned above give us some ideas about how different ID generation systems work. However, none of them meets our specific requirements; thus, we need another approach. Twitter’s unique ID generation system called “snowflake” is inspiring and can satisfy our requirements.
Divide and conquer is our friend. Instead of generating an ID directly, we divide an ID into different sections. The following figure shows the layout of a 64-bit ID.
Each section is explained below.
In the high-level design, we discussed various options to design a unique ID generator in distributed systems. We settle on an approach that is based on the Twitter snowflake ID generator. Let us dive deep into the design. To refresh our memory, the design diagram is relisted below.
Datacenter IDs and machine IDs are chosen at the startup time and generally fixed once the system is up and running. Any changes in data centre IDs and machine IDs require careful review since an accidental change in those values can lead to ID conflicts.
Timestamps and sequence numbers are generated when the ID generator is running.
The most important 41 bits make up the timestamp section. As timestamps grow with time, IDs are sortable by time. The following figure shows an example of how binary representation is converted to UTC. You can also convert UTC back to binary representation using a similar method.
The maximum timestamp that can be represented in 41 bits is 2 ^ 41 - 1 = 2199023255551 milliseconds (ms), which gives us: ~ 69 years = 2199023255551 ms / 1000 / 365 days / 24 hours/ 3600 seconds.
This means the ID generator will work for 69 years and having a custom epoch time close to today’s date delays the overflow time. After 69 years, we will need a new epoch time or adopt other techniques to migrate IDs.
A sequence number is 12 bits, which gives us 2 ^ 12 = 4096 combinations. This field is 0 unless more than one ID is generated in a millisecond on the same server. In theory, a machine can support a maximum of 4096 new IDs per millisecond.
In this post, we discussed different approaches to design a unique ID generator: multi-master replication, UUID, ticket server, and Twitter snowflake-like unique ID generator. We settle on snowflake as it supports all our use cases and is scalable in a distributed environment.
Here are a few additional talking points:
That's all for this post.
Thanks for reading out, I hope you have a nice day!
Design A News Feed System In this post, we are going to design a news feed system. What is a news feed? According to the Facebook help page, "News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups you follow on Facebook".
Design A Notification System A notification system has already become a very popular feature for many applications in recent years. A notification alerts users with important information like breaking news, product updates, events, offerings, etc. It has become an indispensable part of our daily life.