Rate Limiting Strategies for Web APIs and Applications
Rate limiting controls how many requests a client can make to your API документацию or application within a given time period. It protects your infrastructure from abuse, ensures fair usage, and prevents a single user from monopolizing resources.
Why Rate Limiting Matters
- DDoS protection: Limits the impact of volumetric attacks
- Resource fairness: Prevents one user from starving others
- Cost control: Limits usage of expensive backend resources (database, third-party APIs)
- API monetization: Enforces tier-based usage limits (free, pro, business)
- Stability: Protects against accidental traffic spikes (broken retry loops, crawler storms)
Rate Limiting Algorithms
Fixed Window
Count requests in fixed time windows (e.g., per minute). Reset the counter at the start of each window.
- Pros: Simple to implement, low memory
- Cons: Burst problem — a user can make 100 requests at 0:59 and 100 more at 1:00, effectively doubling their limit
Sliding Window Log
Store a timestamp for every request. Count requests within the last N seconds. Precise but memory-intensive.
- Pros: No burst problem, accurate
- Cons: High memory — stores every request timestamp
Sliding Window Counter
Hybrid approach: use fixed windows but weight the previous window proportionally. If you're 30% into the current window, count 70% of the previous window's requests plus 100% of the current.
- Pros: Smooth, low memory, no burst problem
- Cons: Approximate (but very close in practice)
Token Bucket
Imagine a bucket that fills with tokens at a steady rate. Each request consumes a token. If the bucket is empty, the request is rejected. Allows controlled bursts (bucket can accumulate tokens).
- Pros: Allows bursts, smooth, widely used
- Cons: Slightly more complex to implement
Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. Enforces a perfectly smooth output rate.
- Pros: Smoothest output rate
- Cons: Adds latency (queuing), no burst tolerance
Implementation with Redis
-- Sliding window counter in Redis (Lua script)
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now .. math.random())
redis.call('EXPIRE', key, window)
return 0 -- allowed
else
return 1 -- rate limited
end
HTTP Response Headers
Communicate rate limit status to clients:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1679529600
Retry-After: 30
When rate limited, return 429 Too Many Requests:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{"error": "Rate limit exceeded. Try again in 30 seconds."}
What to Rate Limit By
- IP address: Simplest, but shared IPs (NAT, corporate proxies) affect multiple users
- API key: More accurate for authenticated APIs. Each key gets its own limits.
- User account: Per-user limits regardless of IP
- Endpoint: Different limits for different endpoints (login: 5/min, search: 30/min, reads: 100/min)
- Combination: IP + endpoint for unauthenticated, user + endpoint for authenticated
Best Practices
- Return informative headers: Always tell clients their limit, remaining quota, and reset time
- Use 429 status code: Not 403 or 503 — 429 is specifically for rate limiting
- Differentiate by plan: Free users get 50/day, Pro gets 5000/day
- Rate limit login endpoints aggressively: 5-10 attempts per minute to prevent brute force
- Implement graceful degradation: Serve cached or simplified responses instead of hard rejecting
- Log rate limit events: Monitor who gets rate limited and why
- Test with realistic traffic: Verify limits work under load
Conclusion
Rate limiting is essential infrastructure for any web API. Start with the sliding window counter (best balance of simplicity and accuracy), implement per-key/per-user limits, and always communicate limits through headers. Well-implemented rate limiting protects your service while maintaining a good developer experience.
Check your website right now
Check now →