|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.btree.BloomFilter
public class BloomFilter
Encapsulates the actual implementation class and provides the protocol for (de-)serialization.
| Nested Class Summary | |
|---|---|
static class |
BloomFilter.BloomFilterCounters
Counters for bloom filter access and notification of false positives. |
| Field Summary | |
|---|---|
BloomFilter.BloomFilterCounters |
counters
Counters are not persistent. |
| Constructor Summary | |
|---|---|
BloomFilter()
De-serialization ctor. |
|
BloomFilter(int n,
double p)
Ctor specifies maxN := n * 2. |
|
BloomFilter(int n,
double p,
int maxN)
|
|
| Method Summary | |
|---|---|
boolean |
add(byte[] key)
Adds the key to the filter. |
boolean |
contains(byte[] key)
Test the filter for the key a true return DOES NOT
guarantee that the key has been added to the filter while a
false return guarantees that the key HAS NOT been added to
the filter. |
long |
disable()
Disables the bloom filter associated with the index. |
void |
falsePos()
Notify the bloom filter that a false positive was observed for a key that for which IBloomFilter.add(byte[]) reported true (the key was
in fact not in the index). |
long |
getAddr()
Address that can be used to read this object from the store. |
long |
getBitLength()
The bit length of the filter. |
static long |
getBitLength(int hashFunctionCount,
int nentries)
Return the bit length required to provision a filter having the specified #of hash functions and the target capacity. |
static int |
getEntryCountForErrorRate(int k,
long m,
double p)
This returns the #of index entries at which the bloom filter will have the specified expected error rate. |
double |
getErrorRate()
The false positive error rate estimated as 2^-d, where
d is the #of hash functions. |
int |
getHashFunctionCount()
The #of hash functions used by the filter. |
static int |
getHashFunctionCount(double errorRate)
Return the #of hash functions required to achieve the specified error rate. |
int |
getMaxN()
The #of index entries at which the filter will have reached its maximum error rate (from the ctor). |
int |
getN()
The expected #of index entries (from the ctor). |
double |
getP()
The target error rate when there are getN() index entries (from
the ctor). |
boolean |
isDirty()
Return true iff the state of the filter has been modified
but not yet written onto the store. |
boolean |
isEnabled()
Return true unless the bloom filter has been disabled. |
static BloomFilter |
read(IRawStore store,
long addr)
Read a bloom filter record from the store. |
void |
readExternal(ObjectInput in)
Note: On read, the addr is set to 0L, the
dirty flag is cleared, and enabled flag is set. |
String |
toString()
|
long |
write(IRawStore store)
Writes the bloom filter on the store and clears the isDirty()
flag. |
void |
writeExternal(ObjectOutput out)
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public transient BloomFilter.BloomFilterCounters counters
| Constructor Detail |
|---|
public BloomFilter()
public BloomFilter(int n,
double p)
maxN := n * 2.
n - The expected #of index entries.p - The target error rate.
public BloomFilter(int n,
double p,
int maxN)
n - The expected #of index entries.p - The target error rate.maxN - The #of index entries at which the filter will have reached
its maximum error rate. (The value is normally calculated by
the BloomFilterFactory.)
IllegalArgumentException - if n is non-positive.
IllegalArgumentException - unless p lies in (0:1).
IllegalArgumentException - if maxN is LT n.| Method Detail |
|---|
public final int getN()
public final double getP()
getN() index entries (from
the ctor).
Note: This class does not know the actual false positive error rate.
However, that is tracked by the AbstractBTree.btreeCounters.
public double getErrorRate()
2^-d, where
d is the #of hash functions. This will be close to but typically
not exactly the same as the value of getP() specified to the
ctor.
Note: This class does not know the actual false positive error rate.
However, that is tracked by the AbstractBTree.btreeCounters.
public final int getMaxN()
public final int getHashFunctionCount()
public final long getBitLength()
public static int getHashFunctionCount(double errorRate)
p is the probability of a false positive and
d is the #of hash functions, then
p = pow(2, -d)and
d = ceil(-ln(p) / ln(2))
public static long getBitLength(int hashFunctionCount,
int nentries)
bitLength = ceil(nentries * (hashFunctionCount / ln(2)))
hashFunctionCount - The #of hash functions.nentries - The target capacity.
public static int getEntryCountForErrorRate(int k,
long m,
double p)
p = (1 - (1 - 1 / m) ˆ kn) ˆ kwhere m is the bit length of the filter, k is the #of hash functions, and n is the #of items that have been inserted into the filter. or approximately
p = (1 - e ˆ -kn / m) ˆ ksolving for
n we obtain
n = -m ln( 1 - pˆ(1/k) ) / k
k - The #of hash functions.m - The bit length of the filter.p - The expected error rate.
http://en.wikipedia.org/wiki/Bloom_filterpublic boolean add(byte[] key)
IBloomFilter
add in interface IBloomFilterkey - The key.
true iff the filter state was unchanged by the
incorporation of this key into the filter.
IllegalStateException - if the filter has been disable()dpublic boolean contains(byte[] key)
IBloomFiltertrue return DOES NOT
guarantee that the key has been added to the filter while a
false return guarantees that the key HAS NOT been added to
the filter.
contains in interface IBloomFilterkey - The key.
true if the filter has either that key or some key
that is hash equivalent to that key using the hashing function
imposed by the filter; false iff the filter can
guarantee that the key has not been added to the filter.
IllegalStateException - if the filter has been disable()dpublic String toString()
toString in class Objectpublic final long getAddr()
Note: This is not a persistent property. However the value is set when the record is read from, or written on, the store.
public static BloomFilter read(IRawStore store,
long addr)
store - the store.addr - the address of the bloom filter record.
public final boolean isDirty()
true iff the state of the filter has been modified
but not yet written onto the store. The filter is presumed clean when
created or when it is read from the store. The filter will remain clean
until add(byte[]) returns true, indicating that
the state of the filter has been changed.
public long write(IRawStore store)
isDirty()
flag.
Note: This also sets the address on addr as a side-effect, but
the address is NOT written into the store since it is not available until
after the record has been serialized.
Note: This method DOES NOT test isDirty().
store - The store.
IllegalStateException - if the filter is not dirty.
IllegalStateException - if the filter is not enabled.public final long disable()
Note: This method is invoked by AbstractBTree.insert(byte[], byte[]) when
the #of index entries exceeds the maximum allowed for the bloom filter.
At that point the BTree is dirty. Checkpoint will notice
that the bloom filter is disabled will write its address as 0L so the
bloom filter is no longer reachable from the post-checkpoint record.
public final boolean isEnabled()
true unless the bloom filter has been disabled.
Note: A bloom filter may be disabled is the #of index entries has exceeded the maximum desired error rate for the bloom filter.
public void readExternal(ObjectInput in)
throws IOException,
ClassNotFoundException
addr is set to 0L, the
dirty flag is cleared, and enabled flag is set. It's
necessary to override serialization otherwise java will default the
enabled flag to false when an object is
de-serialized - whoops!
readExternal in interface Externalizablein -
IOException
ClassNotFoundException
public void writeExternal(ObjectOutput out)
throws IOException
writeExternal in interface Externalizableout -
IOExceptionpublic void falsePos()
IBloomFilterIBloomFilter.add(byte[]) reported true (the key was
in fact not in the index). This method exists solely for reporting and
tracking the actual error rate of the bloom filter.
falsePos in interface IBloomFilter
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||