com.bigdata.journal
Class DiskOnlyStrategy

java.lang.Object
  extended by com.bigdata.rawstore.AbstractRawStore
      extended by com.bigdata.rawstore.AbstractRawWormStore
          extended by com.bigdata.journal.AbstractBufferStrategy
              extended by com.bigdata.journal.DiskOnlyStrategy
All Implemented Interfaces:
IBufferStrategy, IDiskBasedStrategy, IAddressManager, IMRMW, IMROW, IRawStore, IStoreSerializer, IUpdateStore, IWORM

public class DiskOnlyStrategy
extends AbstractBufferStrategy
implements IDiskBasedStrategy, IUpdateStore

Disk-based journal strategy.

Writes are buffered in a write cache. The cache is flushed when it would overflow. As a result only large sequential writes are performed on the store. Reads read through the write cache for consistency.

Note: This is used to realize both the BufferMode.Disk and the BufferMode.Temporary BufferModes. When configured for the BufferMode.Temporary mode: the root blocks will not be written onto the disk, writes will not be forced, and the backing file will be created the first time the DiskOnlyStrategy attempts to write through to the disk. For many scenarios, the backing file will never be created unless the write cache overflows. This provides very low latency on start-up, the same MRMW capability, and allows very large temporary stores. FIXME Examine behavior when write caching is enabled/disabled for the OS. This has a profound impact. Asynchronous writes of multiple buffers, and the use of smaller buffers, may be absolutely when the write cache is disabled. It may be that swapping sets in because the Windows write cache is being overworked, in which case doing incremental and async IO would help. Compare with behavior on server platforms. See http://support.microsoft.com/kb/259716, http://www.accucadd.com/TechNotes/Cache/WriteBehindCache.htm, http://msdn2.microsoft.com/en-us/library/aa365165.aspx, http://www.jasonbrome.com/blog/archives/2004/04/03/writecache_enabled.html, http://support.microsoft.com/kb/811392, http://mail-archives.apache.org/mod_mbox/db-derby-dev/200609.mbox/%3C44F820A8.6000000@sun.com%3E

                /sbin/hdparm -W 0 /dev/hda 0 Disable write caching
                /sbin/hdparm -W 1 /dev/hda 1 Enable write caching
 

Version:
$Id: DiskOnlyStrategy.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
See Also:
BufferMode.Disk, BufferMode.Temporary
TODO:
report whether or not the on-disk write cache is enabled for each platform in AbstractStatisticsCollector. offer guidence on how to disable that write cache., The flush of the write cache could be made asynchronous if we had two write buffers, but that increases the complexity significantly. It would have to be synchronous if invoked from force(boolean) in any case (or rather force would have to flush all buffers).

Reconsider a 2nd buffer so that we can avoid waiting on the writes to disk. Use Executors#newSingleThreadExecutor(java.util.concurrent.ThreadFactory) to obtain the 2nd (daemon) thread and an. {@link Exchanger}.

Consider the generalization where a WriteCache encapulates the logic that exists in this class and where we have a {@link BlockingQueue} of available write caches. There is one "writable" writeCache object at any given time, unless we are blocked waiting for one to show up on the availableQueue. When a WriteCache is full it is placed onto a writeQueue. A thread reads from the writeQueue and performs writes, placing empty WriteCache objects onto the availableQueue. Sync places the current writeCache on the writeQueue and then waits on the writeQueue to be empty. Large objects could be wrapped and written out using the same mechansims but should not become "available" again after they are written.

Consider that a WriteCache also doubles as a read cache IF we create write cache objects encapsulating reads that we read directly from the disk rather than from a WriteCache. In this case we might do a larger read so as to populate more of the WriteCache object in the hope that we will have more hits in that part of the journal.

modify force to use an atomic handoff of the write cache so that the net result is atomic from the perspective of the caller. This may require locking on the write cache so that we wait until concurrent writes have finished before flushing to the disk or I may be able to use nextOffset to make an atomic determination of the range of the buffer to be forced, create a view of that range, and use the view to force to disk so that the position and limits are not changed by force nor by concurrent writers - this may also be a problem for the Direct mode and the Mapped mode, at least if they use a write cache.

Async cache writes are also useful if the disk cache is turned off and could gain importance in offering tighter control over IO guarentees., test verifying that large records are written directly and that the write cache is properly flush beforehand., test verifying that the write cache can be disabled., test verifying that writeCacheOffset is restored correctly on restart (ie., you can continue to append to the store after restart and the result is valid)., test verifying that the buffer position and limit are updated correctly by write(ByteBuffer) regardless of the code path., Retro fit the concept of a write cache into the DirectBufferStrategy so that we defer writes onto the disk until (a) a threshold of data has been buffered; or (b) force(boolean) is invoked. Note that the implementation will be a bit different since the Direct mode is already fully buffered so we do not need to allocate a separate writeCache. However, we will still need to track the writeCacheOffset and maintain a #writeCacheIndex.


Nested Class Summary
static class DiskOnlyStrategy.StoreCounters
          Counters for IRawStore access, including operations that read or write through to the underlying media.
 
Field Summary
 
Fields inherited from class com.bigdata.journal.AbstractBufferStrategy
bufferMode, ERR_ADDRESS_IS_NULL, ERR_ADDRESS_NOT_WRITTEN, ERR_BAD_RECORD_SIZE, ERR_BUFFER_EMPTY, ERR_BUFFER_NULL, ERR_INT32, ERR_NOT_OPEN, ERR_READ_ONLY, ERR_RECORD_LENGTH_ZERO, ERR_TRUNCATE, initialExtent, log, maximumExtent, nextOffset, WARN
 
Fields inherited from class com.bigdata.rawstore.AbstractRawWormStore
am
 
Fields inherited from class com.bigdata.rawstore.AbstractRawStore
serializer
 
Fields inherited from interface com.bigdata.rawstore.IAddressManager
NULL
 
Method Summary
 long allocate(int nbytes)
          Allocate a record without writing it on the store
 void close()
          Closes the file immediately (without flushing any pending writes).
 void closeForWrites()
          Extended to discard the write cache.
 void deleteResources()
          Deletes the backing file(s) (if any) and clears any records for the store from the IGlobalLRU.
 void force(boolean metadata)
          flushs the writeCache before syncing the disk.
 FileChannel getChannel()
          Note: This MAY be null.
 CounterSet getCounters()
          Return interesting information about the write cache and file operations.
 long getExtent()
          The current size of the journal in bytes.
 File getFile()
          The backing file.
 int getHeaderSize()
          The size of the file header in bytes.
 RandomAccessFile getRandomAccessFile()
          Note: This MAY be null.
 DiskOnlyStrategy.StoreCounters getStoreCounters()
          Returns the performance counters for the store.
 long getUserExtent()
          The size of the user data extent in bytes.
 boolean isFullyBuffered()
          True iff the store is fully buffered (all reads are against memory).
 boolean isStable()
          True iff backed by stable storage.
 ByteBuffer read(long addr)
          Note: ClosedChannelException and AsynchronousCloseException can get thrown out of this method (wrapped as RuntimeExceptions) if a reader task is interrupted.
 ByteBuffer readRootBlock(boolean rootBlock0)
          Read the specified root block from the backing file.
 void setStoreCounters(DiskOnlyStrategy.StoreCounters storeCounters)
          Replaces the DiskOnlyStrategy.StoreCounters object.
 long transferTo(RandomAccessFile out)
          A block operation that transfers the serialized records (aka the written on portion of the user extent) en mass from the buffer onto an output file.
 void truncate(long newExtent)
          Either truncates or extends the journal.
 void update(long addr, int off, ByteBuffer data)
          Updates a region of a record.
 long write(ByteBuffer data)
          Write the data (unisolated).
 void writeRootBlock(IRootBlockView rootBlock, ForceEnum forceOnCommit)
          Write the root block onto stable storage (ie, flush it through to disk).
 
Methods inherited from class com.bigdata.journal.AbstractBufferStrategy
assertOpen, destroy, getBufferMode, getInitialExtent, getMaximumExtent, getNextOffset, getResourceMetadata, getUUID, isOpen, isReadOnly, overflow, size, transferFromDiskTo
 
Methods inherited from class com.bigdata.rawstore.AbstractRawWormStore
getAddressManager, getByteCount, getOffset, getOffsetBits, packAddr, toAddr, toString, unpackAddr
 
Methods inherited from class com.bigdata.rawstore.AbstractRawStore
deserialize, deserialize, deserialize, serialize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.bigdata.journal.IBufferStrategy
getBufferMode, getInitialExtent, getMaximumExtent, getNextOffset
 
Methods inherited from interface com.bigdata.rawstore.IRawStore
destroy, getResourceMetadata, getUUID, isOpen, isReadOnly, size
 
Methods inherited from interface com.bigdata.rawstore.IAddressManager
getByteCount, getOffset, packAddr, toAddr, toString, unpackAddr
 
Methods inherited from interface com.bigdata.rawstore.IStoreSerializer
deserialize, deserialize, deserialize, serialize
 

Method Detail

getHeaderSize

public final int getHeaderSize()
Description copied from interface: IDiskBasedStrategy
The size of the file header in bytes.

Specified by:
getHeaderSize in interface IBufferStrategy
Specified by:
getHeaderSize in interface IDiskBasedStrategy

getFile

public final File getFile()
Description copied from interface: IDiskBasedStrategy
The backing file.

Specified by:
getFile in interface IDiskBasedStrategy
Specified by:
getFile in interface IRawStore

getRandomAccessFile

public final RandomAccessFile getRandomAccessFile()
Note: This MAY be null. If BufferMode.Temporary is used then it WILL be null until the writeCache is flushed to disk for the first time.

Specified by:
getRandomAccessFile in interface IDiskBasedStrategy

getChannel

public final FileChannel getChannel()
Note: This MAY be null. If BufferMode.Temporary is used then it WILL be null until the writeCache is flushed to disk for the first time.

Specified by:
getChannel in interface IDiskBasedStrategy

getStoreCounters

public DiskOnlyStrategy.StoreCounters getStoreCounters()
Returns the performance counters for the store.


setStoreCounters

public void setStoreCounters(DiskOnlyStrategy.StoreCounters storeCounters)
Replaces the DiskOnlyStrategy.StoreCounters object.

Parameters:
storeCounters - The new BTree.Counters.
Throws:
IllegalArgumentException - if the argument is null.

getCounters

public CounterSet getCounters()
Return interesting information about the write cache and file operations.

Specified by:
getCounters in interface IBufferStrategy
Specified by:
getCounters in interface IRawStore

isStable

public final boolean isStable()
Description copied from interface: IRawStore
True iff backed by stable storage.

Specified by:
isStable in interface IRawStore

isFullyBuffered

public boolean isFullyBuffered()
Description copied from interface: IRawStore
True iff the store is fully buffered (all reads are against memory). Implementations MAY change the value returned by this method over the life cycle of the store, e.g., to conserve memory a store may drop or decrease its buffer if it is backed by disk.

Note: This does not guarantee that the OS will not swap the buffer onto disk.

Specified by:
isFullyBuffered in interface IRawStore

force

public void force(boolean metadata)
flushs the writeCache before syncing the disk.

Specified by:
force in interface IRawStore
Parameters:
metadata - If true, then force both the file contents and the file metadata to disk.

close

public void close()
Closes the file immediately (without flushing any pending writes).

Specified by:
close in interface IRawStore
Overrides:
close in class AbstractBufferStrategy

deleteResources

public void deleteResources()
Description copied from interface: IRawStore
Deletes the backing file(s) (if any) and clears any records for the store from the IGlobalLRU.

Specified by:
deleteResources in interface IRawStore

getExtent

public final long getExtent()
Description copied from interface: IBufferStrategy
The current size of the journal in bytes. When the journal is backed by a disk file this is the actual size on disk of that file. The initial value for this property is set by Options.INITIAL_EXTENT.

Specified by:
getExtent in interface IBufferStrategy

getUserExtent

public final long getUserExtent()
Description copied from interface: IBufferStrategy
The size of the user data extent in bytes.

Note: The size of the user extent is always generally smaller than the value reported by IBufferStrategy.getExtent() since the latter also reports the space allocated to the journal header and root blocks.

Specified by:
getUserExtent in interface IBufferStrategy

read

public ByteBuffer read(long addr)
Note: ClosedChannelException and AsynchronousCloseException can get thrown out of this method (wrapped as RuntimeExceptions) if a reader task is interrupted.

Specified by:
read in interface IRawStore
Parameters:
addr - A long integer that encodes both the offset from which the data will be read and the #of bytes to be read. See IAddressManager.toAddr(int, long).
Returns:
The data read. The buffer will be flipped to prepare for reading (the position will be zero and the limit will be the #of bytes read).

allocate

public long allocate(int nbytes)
Description copied from interface: IUpdateStore
Allocate a record without writing it on the store

Note: The contents of the record having that address are undefined unless until data is written onto the record using IUpdateStore.update(long, int, ByteBuffer) and only those bytes actually written will be defined.

Specified by:
allocate in interface IUpdateStore
Parameters:
nbytes - The #of bytes in the record.
Returns:
The address of the record.

update

public void update(long addr,
                   int off,
                   ByteBuffer data)
Description copied from interface: IUpdateStore
Updates a region of a record. The record may have been written or simply allocated. The bytes in data from the Buffer.position() to the Buffer.limit() will be written starting at off bytes into the record identified by the addr. The state of other bytes in the record are unchanged. If their state was undefined (e.g., the record was IUpdateStore.allocate(int)'d but not written) then their state will remain undefined.

Specified by:
update in interface IUpdateStore
Parameters:
addr - The address of an existing record.
off - The offset into that record at which the data will be written.
data - The data to be written.

write

public long write(ByteBuffer data)
Description copied from interface: IRawStore
Write the data (unisolated).

Specified by:
write in interface IRawStore
Parameters:
data - The data. The bytes from the current Buffer.position() to the Buffer.limit() will be written and the Buffer.position() will be advanced to the Buffer.limit() . The caller may subsequently modify the contents of the buffer without changing the state of the store (i.e., the data are copied into the store).
Returns:
A long integer formed that encodes both the offset from which the data may be read and the #of bytes to be read. See IAddressManager.

readRootBlock

public ByteBuffer readRootBlock(boolean rootBlock0)
Description copied from interface: IBufferStrategy
Read the specified root block from the backing file.

Specified by:
readRootBlock in interface IBufferStrategy

writeRootBlock

public void writeRootBlock(IRootBlockView rootBlock,
                           ForceEnum forceOnCommit)
Description copied from interface: IBufferStrategy
Write the root block onto stable storage (ie, flush it through to disk).

Specified by:
writeRootBlock in interface IBufferStrategy
Parameters:
rootBlock - The root block. Which root block is indicated by IRootBlockView.isRootBlock0().

truncate

public void truncate(long newExtent)
Description copied from interface: IBufferStrategy
Either truncates or extends the journal.

Note: Implementations of this method MUST be synchronized so that the operation is atomic with respect to concurrent writers.

Specified by:
truncate in interface IBufferStrategy
Parameters:
newExtent - The new extent of the journal. This value represent the total extent of the journal, including any root blocks together with the user extent.

transferTo

public long transferTo(RandomAccessFile out)
                throws IOException
Description copied from interface: IBufferStrategy
A block operation that transfers the serialized records (aka the written on portion of the user extent) en mass from the buffer onto an output file. The buffered records are written "in order" starting at the current position on the output file. The file is grown if necessary. The file position is advanced to the last byte written on the file.

Note: Implementations of this method MUST be synchronized so that the operation is atomic with respect to concurrent writers.

Specified by:
transferTo in interface IBufferStrategy
Parameters:
out - The file to which the buffer contents will be transferred.
Returns:
The #of bytes written.
Throws:
IOException

closeForWrites

public void closeForWrites()
Extended to discard the write cache.

Note: The file is NOT closed and re-opened in a read-only mode in order to avoid causing difficulties for concurrent readers.

Specified by:
closeForWrites in interface IBufferStrategy
Overrides:
closeForWrites in class AbstractBufferStrategy


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.