com.bigdata.btree
Class BloomFilter

java.lang.Object
  extended by com.bigdata.btree.BloomFilter
All Implemented Interfaces:
IBloomFilter, Externalizable, Serializable

public class BloomFilter
extends Object
implements IBloomFilter, Externalizable

Encapsulates the actual implementation class and provides the protocol for (de-)serialization.

Version:
$Id: BloomFilter.java 5809 2011-12-19 16:56:48Z thompsonbry $
Author:
Bryan Thompson
See Also:
Serialized Form
TODO:
Compare to the latest DSI code for the bloom filter. Has it been made faster?

Nested Class Summary
static class BloomFilter.BloomFilterCounters
          Counters for bloom filter access and notification of false positives.
 
Field Summary
 BloomFilter.BloomFilterCounters counters
          Counters are not persistent.
 
Constructor Summary
BloomFilter()
          De-serialization ctor.
BloomFilter(int n, double p)
          Ctor specifies maxN := n * 2.
BloomFilter(int n, double p, int maxN)
           
 
Method Summary
 boolean add(byte[] key)
          Adds the key to the filter.
 boolean contains(byte[] key)
          Test the filter for the key a true return DOES NOT guarantee that the key has been added to the filter while a false return guarantees that the key HAS NOT been added to the filter.
 long disable()
          Disables the bloom filter associated with the index.
 void falsePos()
          Notify the bloom filter that a false positive was observed for a key that for which IBloomFilter.add(byte[]) reported true (the key was in fact not in the index).
 long getAddr()
          Address that can be used to read this object from the store.
 long getBitLength()
          The bit length of the filter.
static long getBitLength(int hashFunctionCount, int nentries)
          Return the bit length required to provision a filter having the specified #of hash functions and the target capacity.
static int getEntryCountForErrorRate(int k, long m, double p)
          This returns the #of index entries at which the bloom filter will have the specified expected error rate.
 double getErrorRate()
          The false positive error rate estimated as 2^-d, where d is the #of hash functions.
 int getHashFunctionCount()
          The #of hash functions used by the filter.
static int getHashFunctionCount(double errorRate)
          Return the #of hash functions required to achieve the specified error rate.
 int getMaxN()
          The #of index entries at which the filter will have reached its maximum error rate (from the ctor).
 int getN()
          The expected #of index entries (from the ctor).
 double getP()
          The target error rate when there are getN() index entries (from the ctor).
 boolean isDirty()
          Return true iff the state of the filter has been modified but not yet written onto the store.
 boolean isEnabled()
          Return true unless the bloom filter has been disabled.
static BloomFilter read(IRawStore store, long addr)
          Read a bloom filter record from the store.
 void readExternal(ObjectInput in)
          Note: On read, the addr is set to 0L, the dirty flag is cleared, and enabled flag is set.
 String toString()
           
 long write(IRawStore store)
          Writes the bloom filter on the store and clears the isDirty() flag.
 void writeExternal(ObjectOutput out)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

counters

public transient BloomFilter.BloomFilterCounters counters
Counters are not persistent.

Constructor Detail

BloomFilter

public BloomFilter()
De-serialization ctor.


BloomFilter

public BloomFilter(int n,
                   double p)
Ctor specifies maxN := n * 2.

Parameters:
n - The expected #of index entries.
p - The target error rate.

BloomFilter

public BloomFilter(int n,
                   double p,
                   int maxN)
Parameters:
n - The expected #of index entries.
p - The target error rate.
maxN - The #of index entries at which the filter will have reached its maximum error rate. (The value is normally calculated by the BloomFilterFactory.)
Throws:
IllegalArgumentException - if n is non-positive.
IllegalArgumentException - unless p lies in (0:1).
IllegalArgumentException - if maxN is LT n.
Method Detail

getN

public final int getN()
The expected #of index entries (from the ctor).


getP

public final double getP()
The target error rate when there are getN() index entries (from the ctor).

Note: This class does not know the actual false positive error rate. However, that is tracked by the AbstractBTree.btreeCounters.


getErrorRate

public double getErrorRate()
The false positive error rate estimated as 2^-d, where d is the #of hash functions. This will be close to but typically not exactly the same as the value of getP() specified to the ctor.

Note: This class does not know the actual false positive error rate. However, that is tracked by the AbstractBTree.btreeCounters.


getMaxN

public final int getMaxN()
The #of index entries at which the filter will have reached its maximum error rate (from the ctor).


getHashFunctionCount

public final int getHashFunctionCount()
The #of hash functions used by the filter.


getBitLength

public final long getBitLength()
The bit length of the filter.


getHashFunctionCount

public static int getHashFunctionCount(double errorRate)
Return the #of hash functions required to achieve the specified error rate. If p is the probability of a false positive and d is the #of hash functions, then
 p = pow(2, -d)
 
and
 d = ceil(-ln(p) / ln(2))
 


getBitLength

public static long getBitLength(int hashFunctionCount,
                                int nentries)
Return the bit length required to provision a filter having the specified #of hash functions and the target capacity. This is
 
 bitLength = ceil(nentries * (hashFunctionCount / ln(2)))
 
 

Parameters:
hashFunctionCount - The #of hash functions.
nentries - The target capacity.
Returns:
The required bit length.

getEntryCountForErrorRate

public static int getEntryCountForErrorRate(int k,
                                            long m,
                                            double p)
This returns the #of index entries at which the bloom filter will have the specified expected error rate. The probability of a false positive is
 p = (1 - (1 - 1 / m) ˆ kn) ˆ k
 
where m is the bit length of the filter, k is the #of hash functions, and n is the #of items that have been inserted into the filter. or approximately
 p = (1 - e ˆ -kn / m) ˆ k
 
solving for n we obtain
 n = -m ln( 1 - pˆ(1/k) ) / k
 

Parameters:
k - The #of hash functions.
m - The bit length of the filter.
p - The expected error rate.
Returns:
The #of index entries at which the expected bloom filter error rate will be the specified value.
See Also:
http://en.wikipedia.org/wiki/Bloom_filter

add

public boolean add(byte[] key)
Description copied from interface: IBloomFilter
Adds the key to the filter.

Specified by:
add in interface IBloomFilter
Parameters:
key - The key.
Returns:
true iff the filter state was unchanged by the incorporation of this key into the filter.
Throws:
IllegalStateException - if the filter has been disable()d

contains

public boolean contains(byte[] key)
Description copied from interface: IBloomFilter
Test the filter for the key a true return DOES NOT guarantee that the key has been added to the filter while a false return guarantees that the key HAS NOT been added to the filter.

Specified by:
contains in interface IBloomFilter
Parameters:
key - The key.
Returns:
true if the filter has either that key or some key that is hash equivalent to that key using the hashing function imposed by the filter; false iff the filter can guarantee that the key has not been added to the filter.
Throws:
IllegalStateException - if the filter has been disable()d

toString

public String toString()
Overrides:
toString in class Object

getAddr

public final long getAddr()
Address that can be used to read this object from the store.

Note: This is not a persistent property. However the value is set when the record is read from, or written on, the store.


read

public static BloomFilter read(IRawStore store,
                               long addr)
Read a bloom filter record from the store.

Parameters:
store - the store.
addr - the address of the bloom filter record.
Returns:
the de-serialized bloom filter record. The address from which it was loaded is set on the bloom filter as a side-effect.

isDirty

public final boolean isDirty()
Return true iff the state of the filter has been modified but not yet written onto the store. The filter is presumed clean when created or when it is read from the store. The filter will remain clean until add(byte[]) returns true, indicating that the state of the filter has been changed.


write

public long write(IRawStore store)
Writes the bloom filter on the store and clears the isDirty() flag.

Note: This also sets the address on addr as a side-effect, but the address is NOT written into the store since it is not available until after the record has been serialized.

Note: This method DOES NOT test isDirty().

Parameters:
store - The store.
Returns:
The address on which it was written.
Throws:
IllegalStateException - if the filter is not dirty.
IllegalStateException - if the filter is not enabled.

disable

public final long disable()
Disables the bloom filter associated with the index. A disabled bloom filter can not be persisted and will not respond to queries or permit mutations.

Note: This method is invoked by AbstractBTree.insert(byte[], byte[]) when the #of index entries exceeds the maximum allowed for the bloom filter. At that point the BTree is dirty. Checkpoint will notice that the bloom filter is disabled will write its address as 0L so the bloom filter is no longer reachable from the post-checkpoint record.

Returns:
the current address for recycling

isEnabled

public final boolean isEnabled()
Return true unless the bloom filter has been disabled.

Note: A bloom filter may be disabled is the #of index entries has exceeded the maximum desired error rate for the bloom filter.

Returns:
iff the bloom filter is enabled.

readExternal

public void readExternal(ObjectInput in)
                  throws IOException,
                         ClassNotFoundException
Note: On read, the addr is set to 0L, the dirty flag is cleared, and enabled flag is set. It's necessary to override serialization otherwise java will default the enabled flag to false when an object is de-serialized - whoops!

Specified by:
readExternal in interface Externalizable
Parameters:
in -
Throws:
IOException
ClassNotFoundException

writeExternal

public void writeExternal(ObjectOutput out)
                   throws IOException
Specified by:
writeExternal in interface Externalizable
Parameters:
out -
Throws:
IOException

falsePos

public void falsePos()
Description copied from interface: IBloomFilter
Notify the bloom filter that a false positive was observed for a key that for which IBloomFilter.add(byte[]) reported true (the key was in fact not in the index). This method exists solely for reporting and tracking the actual error rate of the bloom filter.

Specified by:
falsePos in interface IBloomFilter


Copyright © 2006-2011 SYSTAP, LLC. All Rights Reserved.