May 7, 2026

Design a Distributed Rate Limiter

Interview notes on rate limiting algorithms, storage choices, consistency tradeoffs, and API placement.

Reading notes System design Flow first

Why This Problem Matters

A rate limiter is one of the first protection layers in a large system. It prevents abuse, protects backend services, and enforces fair usage.

The hard parts are the algorithm, atomicity, and availability.

Clarify the Problem

Before designing anything, ask:

Is the limiter global or service-specific?
Is it per user, per IP, or per API key?
Do we need strong consistency or is slight over-counting acceptable?
What is the traffic scale?

Requirements

Functional requirements:

Identify a client
Apply per-client or per-endpoint limits
Return HTTP 429 when the limit is exceeded

Non-functional requirements:

Very low latency overhead
High availability
High request throughput

Where to Place It

The usual placement options are:

API gateway
Middleware in each service
Dedicated rate limiter service

The practical answer is often gateway for global limits plus a dedicated limiter for finer-grained rules.

Client Identification

Common identifiers:

User ID from JWT
IP address from X-Forwarded-For
API key from X-API-Key

You can combine these for user-level, IP-level, endpoint-level, and global limits.

Algorithms

Fixed Window Counter

Simple and efficient, but it has the boundary burst problem.

Sliding Window Log

Accurate, but memory-expensive because every request timestamp is stored.

Sliding Window Counter

Good balance of accuracy, memory, and speed.

Token Bucket

Usually the best interview answer. It allows bursts while preserving a sustained rate.

Atomicity

If multiple nodes update the same bucket, the read-modify-write cycle must be atomic. Redis Lua scripts are a common solution because the entire script executes as one atomic operation.

Final Answer

The best answer is usually: make the policy explicit, choose a low-latency algorithm, store counters in Redis or a similar fast store, and keep the limiter highly available even if some limits are slightly approximate.