Design A Rate Limiter: Part 2 | Aryan Agarwal

editor-img
Aryan Agarwal
Oct 27, 2022

Design A Rate Limiter: Part 2

We'll continue from where we left off in the previous post in this post.

In this post, I'll be covering Step 3 and Step 4.

Step 3 - Design deep dive

The high-level design in the following figure does not answer the following questions:

  • How are rate limiting rules created? Where are the rules stored?
  • How to handle requests that are rate limited?

media

In this section, we will first answer the questions regarding rate limiting rules and then go over the strategies to handle rate-limited requests. Finally, we will discuss rate limiting in distributed environment, a detailed design, performance optimization and monitoring.

Rate limiting rules

Lyft open-sourced their rate-limiting component. We will peek inside of the component and look at some examples of rate limiting rules:

domain: messagingdescriptors: - key: message_type value: marketing rate_limit: unit: day requests_per_unit: 5

In the above example, the system is configured to allow a maximum of 5 marketing messages per day.

Here is another example:

domain: authdescriptors: - key: auth_type value: login rate_limit: unit: minute requests_per_unit: 5

This rule shows that clients are not allowed to login more than 5 times in 1 minute. Rules are generally written in configuration files and saved on disk.

Exceeding the rate limit

In case a request is rate-limited, APIs return an HTTP response code 429 (too many requests) to the client. Depending on the use cases, we may enqueue the rate-limited requests to be processed later. For example, if some orders are rate-limited due to system overload, we may keep those orders to be processed later.

Rate limiter headers

How does a client know whether it is being throttled? And how does a client know the number of allowed remaining requests before being throttled? The answer lies in HTTP response headers.

The rate limiter returns the following HTTP headers to clients:

X-Ratelimit-Remaining: The remaining number of allowed requests within the window.X-Ratelimit-Limit: It indicates how many calls the client can make per time window.X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again without being throttled.

When a user has sent too many requests, a 429 too many requests error and X-Ratelimit-Retry-After header are returned to the client.

Detailed design

The following figure presents a detailed design of the system.

media

  • Rules are stored on the disk. Workers frequently pull rules from the disk and store them in the cache.
  • When a client sends a request to the server, the request is sent to the rate limiter middleware first.
  • Rate limiter middleware loads rules from the cache. It fetches counters and last request timestamp from Redis cache. Based on the response, the rate limiter decides:
  • if the request is not rate limited, it is forwarded to API servers.
  • if the request is rate limited, the rate limiter returns 429 too many requests error to the client. In the meantime, the request is either dropped or forwarded to the queue.

Rate limiter in a distributed environment

Building a rate limiter that works in a single server environment is not difficult. However, scaling the system to support multiple servers and concurrent threads is a different story. There are two challenges:

  • Race condition
  • Synchronization issue

Race condition

As discussed earlier, the rate limiter works as follows at the high level:

  • Read the counter value from Redis.
  • Check if (counter + 1) exceeds the threshold.
  • If not, increment the counter value by 1 in Redis.

Race conditions can happen in a highly concurrent environment as shown in the following figure.

media

Assume the counter value in Redis is 3. If two requests concurrently read the counter value before either of them writes the value back, each will increment the counter by one and write it back without checking the other thread. Both requests (threads) believe they have the correct counter value 4. However, the correct counter value should be 5.

Locks are the most obvious solution for solving race conditions. However, locks will significantly slow down the system. Two strategies are commonly used to solve the problem:

For readers interested in these strategies, refer to the corresponding reference materials.

Synchronization issue

Synchronization is another important factor to consider in a distributed environment. To support millions of users, one rate limiter server might not be enough to handle the traffic. When multiple rate limiter servers are used, synchronization is required.

media

For example, on the left side of the above figure, client 1 sends requests to rate limiter 1, and client 2 sends requests to rate limiter 2.

As the web tier is stateless, clients can send requests to a different rate limiter as shown on the right side of the above figure. If no synchronization happens, rate limiter 1 does not contain any data about client 2. Thus, the rate limiter cannot work properly.

One possible solution is to use sticky sessions that allow a client to send traffic to the same rate limiter. This solution is not advisable because it is neither scalable nor flexible. A better approach is to use centralized data stores like Redis. The design is shown in the following figure.

media

Performance optimization

Performance optimization is a common topic in system design. We will cover Multi-data centre to improve.

Multi-data centre setup is crucial for a rate limiter because latency is high for users located far away from the data centre. Most cloud service providers build many edge server locations around the world.

For example, as of 5/20 2020, Cloudflare has 194 geographically distributed edge servers. Traffic is automatically routed to the closest edge server to reduce latency.

media

Monitoring

After the rate limiter is put in place, it is important to gather analytics data to check whether the rate limiter is effective. Primarily, we want to make sure:

  • The rate-limiting algorithm is effective.
  • The rate-limiting rules are effective.

For example, if rate-limiting rules are too strict, many valid requests are dropped. In this case, we want to relax the rules a little bit.

In another example, we notice our rate limiter becomes ineffective when there is a sudden increase in traffic like flash sales. In this scenario, we may replace the algorithm to support burst traffic. The token bucket is a good fit here.

Step 4 - Wrap up

In these 2 posts, we discussed different algorithms of rate limiting and their pros/cons. Algorithms discussed include:

  1. Token bucket
  2. Leaking bucket
  3. Fixed window counter
  4. Sliding window log
  5. Sliding window counter

Then, we discussed the system architecture, rate limiter in a distributed environment, performance optimization and monitoring. You can also research these points on yourself if you want to learn more:

  • Hard vs soft rate limiting.
  • Hard: The number of requests cannot exceed the threshold.
  • Soft: Requests can exceed the threshold for a short period.
  • Avoid being rate limited. Design your client with best practices:
  • Use client cache to avoid making frequent API calls.
  • Understand the limit and do not send too many requests in a short time frame.
  • Include code to catch exceptions or errors so your client can gracefully recover from exceptions.
  • Add sufficient back-off time to retry logic.

That's all for this post. Go ahead and design a rate limiter!

Thanks for reading out, hope you have a nice day!


Checkout related posts on: