|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.bigdata.rawstore.AbstractRawStore
com.bigdata.rawstore.AbstractRawWormStore
com.bigdata.journal.AbstractBufferStrategy
com.bigdata.journal.BufferedDiskStrategy
public class BufferedDiskStrategy
A disk-based strategy where a large buffer is used to minimize the chance
that a read will read through to the disk (under normal circumstances the
on-disk file will be fully buffered). This is especially important during
asynchronous overflow processing as the data written onto BTrees has
been appended onto the store and more or less random reads are required to
traverse the BTree tuples in index order.
This strategy is designed for use with StoreManager. The expectation
is that the store will be fully buffered MOST of the time.
Typically the Options#INITIAL_EXTENT will be set equal to the
Options#MAXIMUM_EXTENT using a value on the order of 200M. Normally,
overflow will be triggered before the user extent is saturated and the disk
file will remain fully buffered. In these cases there will be NO reads
through to the disk. Note that neither the DirectBufferStrategy nor
the MappedBufferStrategy are suitable for asynchronous overflow
precisely because the JVM does not handle extending a mapped file or correct
release of direct ByteBuffers.
There are a variety of reasons why overflow processing might not be initiated
before the user extent overflows (asynchronous overflow may still be running
on the old journal, the last set of tasks executing may have written more
data that remains in the user extent, etc.). Regardless, in any of these
situations the backing file on the disk will be extended BUT NOT the buffer.
The buffer itself IS NOT extended for reasons that mostly have to do with
memory leaks in the JVM for direct ByteBuffers (in fact, the caller
must provide the buffer via the ctor, in a manner very similar to how the
write cache is managed for the DiskOnlyStrategy).
The buffer provides both a write cache and a read cache, but only until it is full. On commit, all bytes from the last byte flushed to the backing file will be transferred from the buffer to the backing file. On restart, as much data in the user extent as will fit is read from the backing file into the buffer.
Once the buffer is full, a small WriteCache is allocated using the
DirectBufferPool.INSTANCE and reads beyond the extent covered by the
buffer go straight through to the disk. Writes are buffered in a write cache.
The cache is flushed when it would overflow. As a result only large
sequential writes are performed on the store. Reads read through the write
cache for consistency.
One other advantage of this strategy is that interrupts of NIO operations are
much less likely to cause the backing FileChannel to be closed
asynchronously since most reads will be serviced by the buffer rather than
touching the disk. Also, for reads which are serviced by the buffer, we can
offer higher concurrency (reads through to the disk are serialized).
BufferMode.BufferedDisk,
TestBufferedDiskJournalDirectBufferPool instance to be used as a
ctor parameter. that keeps the (de-)allocation local and allows us to
configure the size of the backing buffer when we setup the
StoreManager and to use small buffers for the unit tests (there
can be a distinct DirectBufferPool for
TestBufferedDiskJournal). The AbstractJournal will
need another parameter for the DirectBufferPool. That argument
should be optional and default to null for all of the
other buffer modes. We would still pass in the writeCache., add a test suite for this variant., test out on a cluster. examine behavior of a series of overflow
operations and see how much this does to improve throughput., modify asynchronous overflow to always "finish up" (handle each index
before it ends). explicitly track the indices that have to be processed
and those not yet finished.
What to do about indices that will not overflow? This is a pretty
critical issue as we otherwise could wind up keeping a number of
historical journals on hand. Normally I would expect such issues mainly
with new applications that are being tested, in which case (a) test on
a test federation and (b) the indices can be dropped., test correct force(boolean) (must transfer bytes written since
the last force)., verify that reads return an immutable view of the buffer and
that high concurrency for reads is allowed when the record to be read
lies within the buffered region., test correct transition from the buffered extent onto the unbuffered
extent., verify that records which extend across the buffer are NOT
stored in the buffer (no split reads for sanity's sake)., report whether or not the on-disk write cache is enabled for each
platform in AbstractStatisticsCollector. offer guidence on how
to disable that write cache., test verifying that the write cache comes online atomically., test verifying that writeCache is restored iff necessary on
restart., test verifying writeCacheOffset is restored correctly on
restart (ie., you can continue to append to the store after restart and
the result is valid)., test verifying that the buffer position and limit are updated correctly
by write(ByteBuffer) regardless of the code path., If possible, refactor to share a common base class with the
DiskOnlyStrategy. The main points of departure are the lack of
an option for a read cache in this class and the differences in how the
buffer is layered in., due to the high memory burden, this variant might not be the default
for the unit tests of the services. however, it should be the default
for deployed distributed federations.
FIXME Examine behavior when write caching is enabled/disabled for the OS.
This has a profound impact. Asynchronous writes of multiple buffers, and the
use of smaller buffers, may be absolutely when the write cache is disabled.
It may be that swapping sets in because the Windows write cache is being
overworked, in which case doing incremental and async IO would help. Compare
with behavior on server platforms. See
http://support.microsoft.com/kb/259716,
http://www.accucadd.com/TechNotes/Cache/WriteBehindCache.htm,
http://msdn2.microsoft.com/en-us/library/aa365165.aspx,
http://www.jasonbrome.com/blog/archives/2004/04/03/writecache_enabled.html,
http://support.microsoft.com/kb/811392,
http://mail-archives.apache.org/mod_mbox/db-derby-dev/200609.mbox/%3C44F820A8.6000000@sun.com%3E
/sbin/hdparm -W 0 /dev/hda 0 Disable write caching
/sbin/hdparm -W 1 /dev/hda 1 Enable write caching
| Nested Class Summary | |
|---|---|
static interface |
BufferedDiskStrategy.Options
Options for the BufferedDiskStrategy. |
| Field Summary | |
|---|---|
DiskOnlyStrategy.StoreCounters |
storeCounters
Counters on IRawStore and disk access. |
| Fields inherited from class com.bigdata.journal.AbstractBufferStrategy |
|---|
bufferMode, ERR_ADDRESS_IS_NULL, ERR_ADDRESS_NOT_WRITTEN, ERR_BAD_RECORD_SIZE, ERR_BUFFER_EMPTY, ERR_BUFFER_NULL, ERR_INT32, ERR_NOT_OPEN, ERR_READ_ONLY, ERR_RECORD_LENGTH_ZERO, ERR_TRUNCATE, initialExtent, log, maximumExtent, nextOffset, WARN |
| Fields inherited from class com.bigdata.rawstore.AbstractRawWormStore |
|---|
am |
| Fields inherited from class com.bigdata.rawstore.AbstractRawStore |
|---|
serializer |
| Fields inherited from interface com.bigdata.rawstore.IAddressManager |
|---|
NULL |
| Method Summary | |
|---|---|
long |
allocate(int nbytes)
Allocate a record without writing it on the store |
void |
close()
Closes the file immediately (without flushing any pending writes). |
void |
closeForWrites()
Extended to discard the write cache. |
void |
deleteResources()
Deletes the backing file(s) (if any) and clears any records for the store from the IGlobalLRU. |
void |
force(boolean metadata)
flushes the optional writeCache
before syncing the disk. |
FileChannel |
getChannel()
The channel used to read and write on the file. |
CounterSet |
getCounters()
Return interesting information about the write cache and file operations. |
long |
getExtent()
The current size of the journal in bytes. |
File |
getFile()
The backing file. |
int |
getHeaderSize()
The size of the file header in bytes. |
RandomAccessFile |
getRandomAccessFile()
The object used to read and write on that file. |
long |
getUserExtent()
The size of the user data extent in bytes. |
boolean |
isFullyBuffered()
True iff the store is fully buffered (all reads are against memory). |
boolean |
isStable()
True iff backed by stable storage. |
ByteBuffer |
read(long addr)
Note: ClosedChannelException and
AsynchronousCloseException can get thrown out of this method
(wrapped as RuntimeExceptions) if a reader task is interrupted. |
ByteBuffer |
readRootBlock(boolean rootBlock0)
Read the specified root block from the backing file. |
long |
transferTo(RandomAccessFile out)
A block operation that transfers the serialized records (aka the written on portion of the user extent) en mass from the buffer onto an output file. |
void |
truncate(long newExtent)
Either truncates or extends the journal. |
void |
update(long addr,
int off,
ByteBuffer data)
Updates a region of a record. |
long |
write(ByteBuffer data)
Write the data (unisolated). |
void |
writeRootBlock(IRootBlockView rootBlock,
ForceEnum forceOnCommit)
Write the root block onto stable storage (ie, flush it through to disk). |
| Methods inherited from class com.bigdata.journal.AbstractBufferStrategy |
|---|
assertOpen, destroy, getBufferMode, getInitialExtent, getMaximumExtent, getNextOffset, getResourceMetadata, getUUID, isOpen, isReadOnly, overflow, size, transferFromDiskTo |
| Methods inherited from class com.bigdata.rawstore.AbstractRawWormStore |
|---|
getAddressManager, getByteCount, getOffset, getOffsetBits, packAddr, toAddr, toString, unpackAddr |
| Methods inherited from class com.bigdata.rawstore.AbstractRawStore |
|---|
deserialize, deserialize, deserialize, serialize |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface com.bigdata.journal.IBufferStrategy |
|---|
getBufferMode, getInitialExtent, getMaximumExtent, getNextOffset |
| Methods inherited from interface com.bigdata.rawstore.IRawStore |
|---|
destroy, getResourceMetadata, getUUID, isOpen, isReadOnly, size |
| Methods inherited from interface com.bigdata.rawstore.IAddressManager |
|---|
getByteCount, getOffset, packAddr, toAddr, toString, unpackAddr |
| Methods inherited from interface com.bigdata.rawstore.IStoreSerializer |
|---|
deserialize, deserialize, deserialize, serialize |
| Field Detail |
|---|
public final DiskOnlyStrategy.StoreCounters storeCounters
IRawStore and disk access.
| Method Detail |
|---|
public final int getHeaderSize()
IDiskBasedStrategy
getHeaderSize in interface IBufferStrategygetHeaderSize in interface IDiskBasedStrategypublic final File getFile()
IDiskBasedStrategy
getFile in interface IDiskBasedStrategygetFile in interface IRawStorepublic final RandomAccessFile getRandomAccessFile()
IDiskBasedStrategy
getRandomAccessFile in interface IDiskBasedStrategypublic final FileChannel getChannel()
IDiskBasedStrategy
getChannel in interface IDiskBasedStrategypublic CounterSet getCounters()
getCounters in interface IBufferStrategygetCounters in interface IRawStorepublic final boolean isStable()
IRawStore
isStable in interface IRawStorepublic boolean isFullyBuffered()
IRawStoreNote: This does not guarantee that the OS will not swap the buffer onto disk.
isFullyBuffered in interface IRawStorepublic void force(boolean metadata)
flushes the optional writeCache
before syncing the disk.
force in interface IRawStoremetadata - If true, then force both the file contents and the file
metadata to disk.public void close()
close in interface IRawStoreclose in class AbstractBufferStrategypublic void deleteResources()
IRawStoreIGlobalLRU.
deleteResources in interface IRawStorepublic final long getExtent()
IBufferStrategyOptions.INITIAL_EXTENT.
getExtent in interface IBufferStrategypublic final long getUserExtent()
IBufferStrategy
Note: The size of the user extent is always generally smaller than the
value reported by IBufferStrategy.getExtent() since the latter also reports the
space allocated to the journal header and root blocks.
getUserExtent in interface IBufferStrategypublic ByteBuffer read(long addr)
ClosedChannelException and
AsynchronousCloseException can get thrown out of this method
(wrapped as RuntimeExceptions) if a reader task is interrupted.
read in interface IRawStoreaddr - A long integer that encodes both the offset from which the
data will be read and the #of bytes to be read. See
IAddressManager.toAddr(int, long).
public long allocate(int nbytes)
IUpdateStore
Note: The contents of the record having that address are undefined
unless until data is written onto the record using
IUpdateStore.update(long, int, ByteBuffer) and only those bytes actually
written will be defined.
allocate in interface IUpdateStorenbytes - The #of bytes in the record.
public void update(long addr,
int off,
ByteBuffer data)
IUpdateStoreBuffer.position() to the Buffer.limit() will be
written starting at off bytes into the record identified by the
addr. The state of other bytes in the record are unchanged. If
their state was undefined (e.g., the record was IUpdateStore.allocate(int)'d
but not written) then their state will remain undefined.
update in interface IUpdateStoreaddr - The address of an existing record.off - The offset into that record at which the data will be written.data - The data to be written.public long write(ByteBuffer data)
IRawStore
write in interface IRawStoredata - The data. The bytes from the current
Buffer.position() to the
Buffer.limit() will be written and the
Buffer.position() will be advanced to the
Buffer.limit() . The caller may subsequently
modify the contents of the buffer without changing the state
of the store (i.e., the data are copied into the store).
IAddressManager.public ByteBuffer readRootBlock(boolean rootBlock0)
IBufferStrategy
readRootBlock in interface IBufferStrategy
public void writeRootBlock(IRootBlockView rootBlock,
ForceEnum forceOnCommit)
IBufferStrategy
writeRootBlock in interface IBufferStrategyrootBlock - The root block. Which root block is indicated by
IRootBlockView.isRootBlock0().public void truncate(long newExtent)
IBufferStrategyNote: Implementations of this method MUST be synchronized so that the operation is atomic with respect to concurrent writers.
truncate in interface IBufferStrategynewExtent - The new extent of the journal. This value represent the total
extent of the journal, including any root blocks together with
the user extent.
public long transferTo(RandomAccessFile out)
throws IOException
IBufferStrategyNote: Implementations of this method MUST be synchronized so that the operation is atomic with respect to concurrent writers.
transferTo in interface IBufferStrategyout - The file to which the buffer contents will be transferred.
IOExceptionpublic void closeForWrites()
Note: The file is NOT closed and re-opened in a read-only mode in order to avoid causing difficulties for concurrent readers.
closeForWrites in interface IBufferStrategycloseForWrites in class AbstractBufferStrategy
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||