Skip to content

What is a Bloom Filter

Key idea:

Bloom filter — probabilistic data structure that answers "element is probably in set" or "definitely not in set". Uses bit array + multiple hash functions. Memory-efficient (kilobytes for millions of items), but false positives are possible (no false negatives). Used in Cassandra (skip disk reads for missing keys), CDN (dedupe), DB indices, Chrome URL safety checks.

Below: details, example, related terms, FAQ.

Details

  • False positive rate controllable (typically 1-5%)
  • No false negatives: "not in set" = definitively not
  • Cannot remove elements (counting bloom filters can)
  • Redis: bf.add, bf.exists (RedisBloom module)
  • Typical size: 10 bits per element for 1% FPR

Example

// Redis RedisBloom
> BF.RESERVE users 0.01 100000
> BF.ADD users alice
> BF.EXISTS users alice    → 1 (probably)
> BF.EXISTS users bob      → 0 (definitely not)

Related Terms

Learn more

Frequently Asked Questions

When to use Bloom filter?

Pre-filter before expensive ops: check before DB query, before CDN miss. If >95% queries are "not in set" — saves huge load.

How to choose false positive rate?

Trade-off: lower FPR = bigger filter. 1% FPR = 10 bits/element. 0.1% = 14 bits/element. Choose by downstream cost.

Alternative: HyperLogLog?

HLL counts unique elements (cardinality), not membership. Different use cases.