Rate Limiting Strategies for Web APIs and Applications

Anatoly Oshmanovsky

Security

Rate Limiting Strategies for Web APIs and Applications

Published: 16.03.2026 · ~4 min · 155 views

rate limiting controls how many requests a client can make to your API документацию or application within a given time period. It protects your infrastructure from abuse, ensures fair usage, and prevents a single user from monopolizing resources.

Why Rate Limiting Matters

DDoS protection: Limits the impact of volumetric attacks
Resource fairness: Prevents one user from starving others
Cost control: Limits usage of expensive backend resources (database, third-party APIs)
API monetization: Enforces tier-based usage limits (free, pro, business)
Stability: Protects against accidental traffic spikes (broken retry loops, crawler storms)

Rate Limiting Algorithms

Fixed Window

Count requests in fixed time windows (e.g., per minute). Reset the counter at the start of each window.

Pros: Simple to implement, low memory
Cons: Burst problem — a user can make 100 requests at 0:59 and 100 more at 1:00, effectively doubling their limit

Sliding Window Log

Store a timestamp for every request. Count requests within the last N seconds. Precise but memory-intensive.

Pros: No burst problem, accurate
Cons: High memory — stores every request timestamp

Sliding Window Counter

Hybrid approach: use fixed windows but weight the previous window proportionally. If you're 30% into the current window, count 70% of the previous window's requests plus 100% of the current.

Pros: Smooth, low memory, no burst problem
Cons: Approximate (but very close in practice)

Token Bucket

Imagine a bucket that fills with tokens at a steady rate. Each request consumes a token. If the bucket is empty, the request is rejected. Allows controlled bursts (bucket can accumulate tokens).

Pros: Allows bursts, smooth, widely used
Cons: Slightly more complex to implement

Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. Enforces a perfectly smooth output rate.

Pros: Smoothest output rate
Cons: Adds latency (queuing), no burst tolerance

Implementation with Redis

-- Sliding window counter in Redis (Lua script)
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count current requests
local count = redis.call('ZCARD', key)

if count < limit then
    redis.call('ZADD', key, now, now .. math.random())
    redis.call('EXPIRE', key, window)
    return 0  -- allowed
else
    return 1  -- rate limited
end

HTTP Response Headers

Communicate rate limit status to clients:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1679529600
Retry-After: 30

When rate limited, return 429 Too Many Requests:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 30 seconds."}

What to Rate Limit By

IP address: Simplest, but shared IPs (NAT, corporate proxies) affect multiple users
API key: More accurate for authenticated APIs. Each key gets its own limits.
User account: Per-user limits regardless of IP
Endpoint: Different limits for different endpoints (login: 5/min, search: 30/min, reads: 100/min)
Combination: IP + endpoint for unauthenticated, user + endpoint for authenticated

Best Practices

Return informative headers: Always tell clients their limit, remaining quota, and reset time
Use 429 status code: Not 403 or 503 — 429 is specifically for rate limiting
Differentiate by plan: Free users get 50/day, Pro gets 5000/day
Rate limit login endpoints aggressively: 5-10 attempts per minute to prevent brute force
Implement graceful degradation: Serve cached or simplified responses instead of hard rejecting
Log rate limit events: Monitor who gets rate limited and why
Test with realistic traffic: Verify limits work under load

Conclusion

Rate limiting is essential infrastructure for any web API. Start with the sliding window counter (best balance of simplicity and accuracy), implement per-key/per-user limits, and always communicate limits through headers. Well-implemented rate limiting protects your service while maintaining a good developer experience.

Check your website right now

Check your site's security →

Rate Limiting Strategies for Web APIs and Applications

Why Rate Limiting Matters

Rate Limiting Algorithms

Fixed Window

Sliding Window Log

Sliding Window Counter

Token Bucket

Leaky Bucket

Implementation with Redis

HTTP Response Headers

What to Rate Limit By

Best Practices

Conclusion

Start monitoring for free