Ruby Basics: BloomFilters

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the wikipedia article. Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.
https://github.com/igrigorik/bloomfilter-rb

It is like a Set but lookup is both O(1) for time and space complexity. But there is a chance for false positive becuase hash has collision probability.

Applications:

  • SpellChecker
  • Filtering large base of weak password
  • Find given url has been processed