From Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

In other words, given a set of elements, bloom filter can tell you that: A) given element is definitely not in the set, or B) given element is maybe in the set. It can give you a false positive: it can say that an element is in the set when it in fact is not. But it will never give you a false negative: it will never say that an element is not in the set when in fact it is.

In my past life I used bloom filters to check whether or not I should perform an expensive database index search 🙂

In the following example I construct a bloom filter given a set of strings set1, I then verify that each element of set1 is in the set according to the bloom filter. Finally I try a different set of elements set2, and test what the bloom filter says about those elements. Given big enough bloom filter I get 100% correct answers (that non of the elements in set2 are present). Here’s the code (bloom.cpp):

Contains “Martin” : 1
Contains “Vorbrodt” : 1
Contains “C++” : 1
Contains “Blog” : 1
Contains “Not” : 0
Contains “In” : 0
Contains “The” : 0
Contains “Set” : 0

Program output.

My implementation of the bloom filter is primitive: it uses only one hashing function which increases false positives, but it is very short and clean and can be built upon; maybe at some point I’ll write a full blown implementation; though there’s plenty of examples online; I want this post to be an introduction to the topic.

In any case, here’s my implementation of the bloom filter (bloom.hpp):

Leave a Reply