Implement a Bloom Filter in JavaScript
Bloom filters are probabilistic data structures that are used to test whether an element is a member of a set. They are space-efficient and can quickly tell you if an element might be in the set or is definitely not in the set, with a small chance of false positives. This challenge will have you implement a basic Bloom filter in JavaScript.
Problem Description
Your task is to create a JavaScript BloomFilter class. This class should support adding elements to the filter and checking if an element might exist within it.
Key Requirements:
-
Initialization: The constructor should accept two parameters:
size: The size of the bit array.numHashFunctions: The number of hash functions to use.
-
add(element)method: This method takes a stringelementand adds it to the Bloom filter. This involves hashing the elementnumHashFunctionstimes and setting the corresponding bits in the bit array to 1. -
mightContain(element)method: This method takes a stringelementand returnstrueif the element might be in the filter, andfalseif it definitely is not.
Expected Behavior:
- When an element is added, multiple bits in the internal bit array are set to
1based on the hash functions. - When checking for an element, if any of the bits corresponding to its hash values are
0, the element is definitely not in the filter. - If all the bits corresponding to its hash values are
1, the element might be in the filter. This is where false positives can occur.
Edge Cases to Consider:
- Empty strings.
- Elements that hash to the same indices.
- The size of the bit array and number of hash functions affecting performance and false positive rates.
Examples
Example 1:
const bf = new BloomFilter(100, 3);
bf.add("apple");
bf.add("banana");
console.log(bf.mightContain("apple")); // Output: true
console.log(bf.mightContain("orange")); // Output: false (most likely)
Explanation:
"apple" and "banana" are added. When checking "apple", all its corresponding bits are set, so mightContain returns true. When checking "orange", at least one of its corresponding bits is likely not set, so mightContain returns false.
Example 2:
const bf = new BloomFilter(50, 2);
bf.add("cat");
bf.add("dog");
console.log(bf.mightContain("cat")); // Output: true
console.log(bf.mightContain("dog")); // Output: true
console.log(bf.mightContain("bird")); // Output: false (most likely)
Explanation:
"cat" and "dog" are added. The filter correctly identifies them. "bird" is not added, and with a well-sized filter, it's highly probable that at least one of its hashed indices will have a 0, leading to a false result.
Example 3 (Potential False Positive):
Let's assume a simplified scenario with a very small Bloom filter for demonstration.
// NOTE: For actual use, use larger sizes and more robust hash functions.
// This example is illustrative.
const bf = new BloomFilter(10, 2); // Small size, few hash functions
// Assume hash functions produce indices:
// hash1("apple") -> 2, hash2("apple") -> 5
// hash1("banana") -> 7, hash2("banana") -> 2
// hash1("grape") -> 5, hash2("grape") -> 9
bf.add("apple"); // Sets bits at index 2 and 5
bf.add("banana"); // Sets bits at index 7 and 2 (bit 2 already set)
console.log(bf.mightContain("grape")); // Output: true (FALSE POSITIVE)
console.log(bf.mightContain("orange")); // Output: false (most likely)
Explanation:
Even though "grape" was never added, its hash functions might point to indices (5 and 9) where bits happen to be set by other added elements ("apple" set bit 5, and "banana" might have set bit 9 through its hashing). If both bits corresponding to "grape" are set, mightContain returns true, indicating a false positive. "orange" is not added, and it's likely one of its hashed indices will not have a bit set.
Constraints
size: Must be a positive integer between 100 and 100000.numHashFunctions: Must be a positive integer between 1 and 10.- Input
elementtoaddandmightContainmethods will always be a string. - The implementation of hash functions is up to you, but they should distribute values reasonably well across the
sizeof the bit array. You can use simple, non-cryptographic hash functions for this challenge.
Notes
- You'll need to implement at least two hash functions. For simplicity, you can create variations of a basic hash function, perhaps by seeding them differently or using slightly different mathematical operations.
- A common approach for Bloom filters is to use a bit array. In JavaScript, you can simulate this using a
Uint8Arrayor a regular array of 0s and 1s, or even a boolean array. - The goal is to understand the core mechanics of Bloom filters. The choice of hash functions and exact sizing will impact the false positive rate, which is a key trade-off with Bloom filters.