com.bigdata.bfs
Class BigdataFileSystem

java.lang.Object
  extended by com.bigdata.relation.AbstractResource<IDatabase<BigdataFileSystem>>
      extended by com.bigdata.bfs.BigdataFileSystem
All Implemented Interfaces:
IContentRepository, IMutableResource<IDatabase<BigdataFileSystem>>, ILocatableResource<IDatabase<BigdataFileSystem>>, IRowStoreConstants

public class BigdataFileSystem
extends AbstractResource<IDatabase<BigdataFileSystem>>
implements IContentRepository, IRowStoreConstants

A distributed file system with extensible metadata and atomic append implemented using the bigdata scale-out architecture. Files have a client assigned identifier, which is a Unicode string. The file identifier MAY be structured so as to look like a hierarchical file system using any desired convention. Files are versioned and historical versions MAY be accessed until the next compacting merge discards their data. File data is stored in large blockSize blocks. Partial and even empty blocks are allowed and only the data written will be stored. 2^63-1 distinct blocks may be written per file version, making the maximum possible file size 536,870,912 exabytes. Files may be used as queues, in which case blocks containing new records are atomically appended while a map/reduce style master consumes the head block of the file.

Efficient method are offered for streaming and block oriented IO. All block read and write operations are atomic, including block append. Files may be easily written such that records never cross a block boundary by the expediency of flushing the output stream if a record would overflow the current block. A flush forces the atomic write of a partial block. Partial blocks are stored efficiently - only the bytes actually written are stored. Blocks are large enough that most applications can safely store a large number of logical records in each block. Files comprised of application defined logical records organized into a sequence of blocks are well-suited to map/reduce processing. They may be efficiently split at block boundaries and references to the blocks distributed to clients. Likewise, reduce clients can aggregate data into large files suitable for further map/reduce processing.

The distributed file system uses two scale-out indices to support ACID operations on file metadata and atomic file append. These ACID guarentees arise from the use of unisolated operations on the respective indices and therefore apply only to the individual file metadata or file block operations. In particular, file metadata read and write are atomic and all individual file block IO (read, write, and append) operations are atomic. Atomicity is NOT guarenteed when performing more than a single file block IO operation, e.g., multiple appends MIGHT NOT write sequential blocks since other block operations could have intervened.

The content length of the file is not stored as file metadata. Instead it MAY be estimated by a range count of the index entries spanned by the file's data. The exact file size may be readily determined when reading small files by the expediency of sucking the entire file into a buffer - all reads are at least one block. Streaming processing is advised in all cases when handling large files, including when the file is to be delivered via HTTP.

The metadata index uses a SparseRowStore design, similar to Google's bigtable or Hadoop's HBase. All updates to file version metadata are atomic. The primary key in the metadata index for every file is its FileMetadataSchema.ID. In addition, each version of a file has a distinct FileMetadataSchema.VERSION property. File creation time, version creation time, and file version metadata update timestamps may be recovered from the timestamps associated with the properties in the metadata index. The use of the FileMetadataSchema.CONTENT_TYPE and FileMetadataSchema.CONTENT_ENCODING properties is enforced by the high-level Document interface. Applications are free to define additional properties.

Each time a file is created a new version number is assigned. The data index uses the FileMetadataSchema.ID as the first field in a compound key. The second field is the FileMetadataSchema.VERSION - a 32-bit integer. The remainder of the key is a 64-bit signed block identifier (2^63-1 distinct block identifiers). The block identifiers are strictly monotonic (e.g., up one) and their sequence orders the blocks into the logical byte order of the file.

Operations that create a new file actually create a new file version. The old file version will eventually be garbage collected depending on the policy in effect for compacting merges. Likewise, operations that delete a file simply mark the metadata for the file version as deleted and the file version will be eventually reclaimed. The high-level update(Document) operation in fact simply creates a new file version.

Use cases

Use case: A REST-ful repository. Documents may be stored, updated, read, deleted, and searched using a full text index.

Use case: A map/reduce master reads document metadata using an index scan. It examines the data index's MetadataIndex (that is, the index that knows where each partition of the scale-out data index is stored) and determines which map clients are going to be "close" to each document and then hands off the document to one of those map clients.

Use case: The same as the use case above, but large files are being processed and there is a requirement to "break" the files into splits and hand off the splits. This can be achieved by estimating the file system using a range count and multiplying through by the block size. Blocks may be handed off to the clients in parallel (of course, clients need to deal with the hassle of processing files where records will cross split boundaries unless they always pad out with unused bytes to the next blockSize boundary).

Use case: A reduce client wants to write a very large files so it creates a metadata record for the file and then does a series of atomic appears to the file. The file may grow arbitrarily large. Clients may begin to read from the file as soon as the first block has been flushed.

Use case: Queues MAY be built from the operations to atomically read or delete the first block for the file version. The "design pattern" is to have clients append blocks to the file version, taking care that logical rows never cross a block boundary (e.g., by flushing partial blocks). A master then reads the head block from the file version, distributing the logical records therein to consumers and providing fail safe processing in case consumers die or take too long. Once all records for the head block have been processed the master simply deletes the head block. This "pattern" is quite similar to map/reduce and, like map/reduce, requires that the consumer operations may be safely re-run.

Use case: File replication, retention of deleted versions, and media indexing are administered by creating "zones" comprising one or more index partitions with a shared file identifier prefix, e.g., /tmp or /highly-available, or /deployment-text-index. All files in a given zone share the same policy for file replication, compacting merges (determining when a deleted or even a non-deleted file version will be discarded), and media indexing.

Use case: File rename is NOT a cheap operation. It essentially creates a new file version with the desired name and copies the data from the old file version to the new file version. Finally the old file version is "deleted". This approach is necessary since files may moved from one "zone" to another and since the file data must reside on the index partition(s) identified by its file version. FIXME write a JSON API that interoperates to the extent possible with GAE and HBASE.

Version:
$Id: BigdataFileSystem.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
implement "zones" and their various policies (replication, retention, and media indexing). access control could also be part of the zones., should compression be applied? applications are obviously free to apply their own compression, but it could be convienent to stored compressed blocks. the caller could specify the compression method on a per block basis (we don't want to lookup the file metadata for this). the compression method would be written into a block header. blocks can always be decompressed by examining the header., there should be some constraints on the file identifier but it general it represents a client determined absolute file path name. It is certainly possible to use a flat file namespace, but you can just as readily use a hierarchical one. Unicode characters are supported in the file identifiers., do we need a global lock mechanism to prevent concurrent high-level create/update/delete of the same file? a distributed lease-based lock system derived from jini or built ourselves? Can this be supported with the historical and not yet purged timestamped metadata for the file?

Nested Class Summary
static interface BigdataFileSystem.Options
          Configuration options.
 
Field Summary
protected static boolean DEBUG
          True iff the log level is DEBUG or less.
static String FILE_DATA_INDEX_BASENAME
          The basename of the index in which the file data blocks are stored.
static String FILE_METADATA_INDEX_BASENAME
          The basename of the index in which the file metadata are stored.
protected static boolean INFO
          True iff the log level is INFO or less.
protected static org.apache.log4j.Logger log
           
protected static long MAX_BLOCK
          The maximum block identifier that can be assigned to a file version.
static FileMetadataSchema metadataSchema
           
 
Fields inherited from interface com.bigdata.sparse.IRowStoreConstants
AUTO_TIMESTAMP, AUTO_TIMESTAMP_UNIQUE, CURRENT_ROW, MAX_TIMESTAMP, MIN_TIMESTAMP
 
Constructor Summary
BigdataFileSystem(IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
          Ctor specified by DefaultResourceLocator.
 
Method Summary
 long appendBlock(String id, int version, byte[] b, int off, int len)
          Atomic append of a block to a file version.
protected static void assertLong(Map<String,Object> properties, String name)
           
protected static void assertString(Map<String,Object> properties, String name)
           
protected  void assertWritable()
           
 Iterator<Long> blocks(String id, int version)
          Returns an iterator that visits all block identifiers for the file version in sequence.
 long copyBlocks(String fromId, int fromVersion, String toId, int toVersion)
          Copies blocks from one file version to another.
 long copyStream(String id, int version, InputStream is)
          Copies data from the input stream to the file version.
 void create()
          Note: A commit is required in order for a read-committed view to have access to the registered indices.
 int create(Document doc)
          Create a new persistent document in this repository based on the metadata and content in the supplied document object.
 int create(Map<String,Object> metadata)
          Creates a new file version from the specified metadata.
 long delete(String id)
          Note: A new file version is marked as deleted and then the file blocks for the old version are deleted from the data index.
 long deleteAll(String fromId, String toId)
          Efficient delete of file metadata and file data for all files and file versions spanned by the specified file identifiers.
 boolean deleteBlock(String id, int version, long block)
          Atomic delete of a block for a file version.
 long deleteHead(String id, int version)
          Atomic delete of the first block of the file version.
 void destroy()
          Destroy any logically contained resources (relations, indices).
 ITPV[] getAllVersionInfo(String id)
          Return an array describing all non-eradicated versions of a file.
 long getBlockCount(String id, int version)
          Return the maximum #of blocks in the file version.
 int getBlockSize()
          The size of a file block.
 Iterator<? extends DocumentHeader> getDocumentHeaders(String fromId, String toId)
          Return a listing of the documents and metadata about them in this repository.
 IIndex getFileDataIndex()
          The index in which the file blocks are stored (the index must exist).
 SparseRowStore getFilleMetadataIndex()
          The index in which the file metadata is stored (the index must exist).
 int getOffsetBits()
          The #of bits in a 64-bit long integer identifier that are used to encode the byte offset of a record in the store as an unsigned integer.
 FileVersionInputStream inputStream(String id, int version)
          Read data from a file version.
 FileVersionInputStream inputStream(String id, int version, long tx)
          Read data from a file version.
 boolean isReadOnly()
          true unless {AbstractResource.getTimestamp() is ITx.UNISOLATED.
 OutputStream outputStream(String id, int version)
          Return an output stream that will append on the file version.
 Document read(String id)
          Reads the document metadata for the current version of the specified file.
 byte[] readBlock(String id, int version, long block)
          Atomic read of a block for a file version.
 Reader reader(String id, int version, String encoding)
          Read character data from a file version.
 byte[] readHead(String id, int version)
          Atomic read of the first block of the file version.
 ITPS readMetadata(String id, long timestamp)
          Return the file metadata for the version of the file associated with the specified timestamp.
 Iterator<String> search(String query)
          FIXME Integrate with FullTextIndex to providing indexing and search of file versions.
 int update(Document doc)
          Create a new file version using the supplied file metadata.
 Map<String,Object> updateMetadata(String id, Map<String,Object> metadata)
          Update the metadata for the current file version.
 boolean writeBlock(String id, int version, long block, byte[] b, int off, int len)
          Atomic write of a block for a file version.
 Writer writer(String id, int version, String encoding)
          Return a Writer that will append character data on the file version.
 
Methods inherited from class com.bigdata.relation.AbstractResource
acquireExclusiveLock, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getContainer, getContainerNamespace, getExecutorService, getFullyBufferedReadThreshold, getIndexManager, getMaxParallelSubqueries, getNamespace, getProperties, getProperty, getProperty, getTimestamp, isForceSerialExecution, isNestedSubquery, toString, unlock
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log

INFO

protected static final boolean INFO
True iff the log level is INFO or less.


DEBUG

protected static final boolean DEBUG
True iff the log level is DEBUG or less.


MAX_BLOCK

protected static final long MAX_BLOCK
The maximum block identifier that can be assigned to a file version.

Note: This is limited to -1 so that we can always form the key greater than any valid key for a file version. This is required by the atomic append logic when it seeks the next block identifier. See AtomicBlockAppendProc.

See Also:
Constant Field Values

FILE_METADATA_INDEX_BASENAME

public static final String FILE_METADATA_INDEX_BASENAME
The basename of the index in which the file metadata are stored. The fully qualified name of the index uses AbstractResource.getNamespace() as a prefix.

Note: This is a SparseRowStore governed by the FileMetadataSchema.

See Also:
Constant Field Values

FILE_DATA_INDEX_BASENAME

public static final String FILE_DATA_INDEX_BASENAME
The basename of the index in which the file data blocks are stored. The fully qualified name of the index uses AbstractResource.getNamespace() as a prefix.

Note: The entries in this index are a series of blocks for a file. Blocks are blockSize bytes each and are assigned monotonically increasing block numbers by the atomic append operation. The final block may be smaller (there is no need to pad out the data with nulls). The keys are formed from two fields - a field containing the content identifier followed by an integer field containing the sequential block number. A range scan with a fromKey of the file identifier and a toKey computed using the successor of the file identifier will naturally visit all blocks in a file in sequence.

See Also:
Constant Field Values

metadataSchema

public static final FileMetadataSchema metadataSchema
Constructor Detail

BigdataFileSystem

public BigdataFileSystem(IIndexManager indexManager,
                         String namespace,
                         Long timestamp,
                         Properties properties)
Ctor specified by DefaultResourceLocator.

See Also:
BigdataFileSystem.Options
Method Detail

getOffsetBits

public final int getOffsetBits()
The #of bits in a 64-bit long integer identifier that are used to encode the byte offset of a record in the store as an unsigned integer.

See Also:
Options.OFFSET_BITS, getBlockSize()

getBlockSize

public final int getBlockSize()
The size of a file block. Block identifiers are 64-bit signed integers. The maximum file length is 2^63 - 1 blocks ( 536,870,912 Exabytes).

Note: The BigdataFileSystem makes the assumption that the Options.OFFSET_BITS is the #of offset bits configured for the IDataServices in the connected IBigdataFederation and computes the getBlockSize() based on that assumption. It is NOT possible to write blocks on the BigdataFileSystem whose size is greater than the maximum block size actually configured for the IDataServices in the connected IBigdataFederation.

See Also:
Options.OFFSET_BITS, getOffsetBits()

assertString

protected static void assertString(Map<String,Object> properties,
                                   String name)

assertLong

protected static void assertLong(Map<String,Object> properties,
                                 String name)

getFilleMetadataIndex

public SparseRowStore getFilleMetadataIndex()
The index in which the file metadata is stored (the index must exist).


getFileDataIndex

public IIndex getFileDataIndex()
The index in which the file blocks are stored (the index must exist).


isReadOnly

public boolean isReadOnly()
true unless {AbstractResource.getTimestamp() is ITx.UNISOLATED.


assertWritable

protected final void assertWritable()

create

public void create()
Note: A commit is required in order for a read-committed view to have access to the registered indices. When running against an IBigdataFederation, ITx.UNISOLATED operations will take care of this for you. Otherwise you must do this yourself.

Specified by:
create in interface IMutableResource<IDatabase<BigdataFileSystem>>
Overrides:
create in class AbstractResource<IDatabase<BigdataFileSystem>>

destroy

public void destroy()
Description copied from interface: IMutableResource
Destroy any logically contained resources (relations, indices).

Specified by:
destroy in interface IMutableResource<IDatabase<BigdataFileSystem>>
Overrides:
destroy in class AbstractResource<IDatabase<BigdataFileSystem>>

create

public int create(Map<String,Object> metadata)
Creates a new file version from the specified metadata. The new file version will not have any blocks. You can use either stream-oriented or block oriented IO to write data on the newly created file version.

Parameters:
metadata - The file metadata.
Returns:
The new version identifier.

create

public int create(Document doc)
Description copied from interface: IContentRepository
Create a new persistent document in this repository based on the metadata and content in the supplied document object.

Specified by:
create in interface IContentRepository
Parameters:
doc - an object containing the content and metadata to persist
Returns:
The new version.

read

public Document read(String id)
Reads the document metadata for the current version of the specified file.

Specified by:
read in interface IContentRepository
Parameters:
id - The file identifier.
Returns:
A read-only view of the file version that is capable of reading the content from the repository -or- null iff there is no current version for that file identifier.

readMetadata

public ITPS readMetadata(String id,
                         long timestamp)
Return the file metadata for the version of the file associated with the specified timestamp.

Parameters:
id - The file identifier.
timestamp - The timestamp.
Returns:
A read-only view of the logical row of metadata for that file as of that timestamp.
See Also:
ITPS, SparseRowStore#read(Schema, Object, long, com.bigdata.sparse.INameFilter)

updateMetadata

public Map<String,Object> updateMetadata(String id,
                                         Map<String,Object> metadata)
Update the metadata for the current file version.

Parameters:
id - The file identifier.
metadata - The properties to be written. A null value for a property will cause the corresponding property to be deleted. Properties not present in this map will NOT be modified.
Returns:
The complete metadata for the current file version.

update

public int update(Document doc)
Create a new file version using the supplied file metadata.

Note: This is essentially a delete + create operation. Since the combined operation is NOT atomic it is possible that conflicts can arise when more than one client attempts to update a file concurrently.

Specified by:
update in interface IContentRepository
Parameters:
doc - The file metadata.
Returns:
The new version.

delete

public long delete(String id)
Note: A new file version is marked as deleted and then the file blocks for the old version are deleted from the data index. This sequence means (a) that clients attempting to read on the file using the high level API will not see the file as soon as its metadata is updated; (b) that the timestamp on the deleted version will be strictly LESS THAN the commit time(s) when the file blocks are deleted, so reading from the timestamp of the deleted version will let you see the deleted file blocks. This is a deliberate convenience - if we were to delete the file blocks first then we would not have ready access to a timestamp that would be before the first file block delete and hence sufficient to perform a historical read on the last state of the file before it was deleted.

Specified by:
delete in interface IContentRepository
Parameters:
id - the identifier of the document to delete
Returns:
The #of blocks that were deleted for that file.

getAllVersionInfo

public ITPV[] getAllVersionInfo(String id)
Return an array describing all non-eradicated versions of a file.

This method returns all known version identifiers together with their timestamps, thereby making it possible to read either the metadata or the data for historical file versions - as long as the metadata and/or data has not yet been eradicated.

The file metadata and data blocks for historical version(s) of a file remain available until they are eradicated from their respective indices by a compacting merge in which the history policies no longer perserve those data.

In order to read the historical file metadata you need to know the timestamp associated with the version identifer which you wish to read. This should be timestamp when that version was deleted MINUS ONE in order to read the last valid metadata for the file version that file version was deleted.

Likewise, in order to read the historical version data you need to know the version identifer which you wish to read as well as the timestamp. In this case, use the timestamp when that version was deleted in order to read the last committed state for the file version.

Historical file version metadata is eradicated atomically since the entire logical row will be hosted on the same index partition. Either the file version metadata is available or it is now.

Historical file version data is eradicated one index partition at a time. If the file version spans more than one index partition then it may be possible to read some blocks from the file but not others.

Historical file version metadata and data will remain available until their governing history policy is no longer satisified. Therefore, when in doubt, you can consult the history policy in force for the file to determine whether or not its data may have been eradicated.

Parameters:
id - The file identifier.
Returns:
An array containing (timestamp,version) tuples. Tuples where the ITPV.getValue() returns null give the timestamp at which a file version was deleted. Tuples where the ITPV.getValue() returns non-null give the timestamp at which a file version was created.
See Also:
#readMetadata(String, long), to read the file version metadata based on a timestamp., #inputStream(String, int, long), to read the file data as of a specific timestamp.
TODO:
expose history policy for a file (from its zone metadata, which is replicated onto the index partition metadata). Make sure that the zone metadata is consistent for the file version metadata and file version data. This means looking up the IndexMetadata for the index partition in which the file data is stored.

getDocumentHeaders

public Iterator<? extends DocumentHeader> getDocumentHeaders(String fromId,
                                                             String toId)
Description copied from interface: IContentRepository
Return a listing of the documents and metadata about them in this repository.

Note: If you assign identifiers using a namespace then you can use this method to efficiently visit all documents within that namespace.

Specified by:
getDocumentHeaders in interface IContentRepository
Parameters:
fromId - The identifier of the first document to be visited or null if there is no lower bound.
toId - The identifier of the first document that will NOT be visited or null if there is no upper bound.
Returns:
an iterator of DocumentHeaders.
TODO:
write tests.

deleteAll

public long deleteAll(String fromId,
                      String toId)
Efficient delete of file metadata and file data for all files and file versions spanned by the specified file identifiers. File versions are marked "deleted" before the file blocks are deleted so that you can read on historical file version with exactly the same semantics as delete(String).

Specified by:
deleteAll in interface IContentRepository
Parameters:
fromId - The identifier of the first document to be deleted or null if there is no lower bound.
toId - The identifier of the first document that will NOT be deleted or null if there is no upper bound.
Returns:
The #of files that were deleted.

search

public Iterator<String> search(String query)
FIXME Integrate with FullTextIndex to providing indexing and search of file versions. Deleted file versions should be removed from the text index. There should be explicit metadata on the file version in order for it to be indexed. The text indexer will require content type and encoding information in order to handle indexing. Low-level output stream, writer, block write and block append operations will not trigger the indexer since it depends on the metadata index to know whether or not a file version should be indexed. However you could explicitly submit a file version for indexing.

Perhaps the best way to handle this is to queue document metadata up for a distributed full text indexing service. The service accepts metadata for documents from the queue and decides whether or not the document should be indexed based on its metadata and how the document should be processed if it is to be indexed. Those business rules would be registered with the full text indexing service. (Alternatively, they can be configured with the BigdataFileSystem and applied locally as the blocks of the file are written into the repository. That's certainly easier right off the bat.)

Specified by:
search in interface IContentRepository
Parameters:
query - A query.
Returns:
An iterator visiting the identifiers of the documents in order of decreasing relevance to the query.
TODO:
crawl or query job obtains a set of URLs, writing them onto a file.

m/r job downloads documents based on set of URLs, writing all documents into a single file version. text-based downloads can be record compressed and decompressed after the record is read. binary downloads will be truncated at 64M and might be skipped all together if the exceed the block size (get images, but not wildely large files).

m/r job extracts a simplified html format from the source image, writing the result onto another file. this job will optionally split documents into "pages" by breaking where necessary at paragraph boundaries.

m/r job builds text index from simplified html format.

m/r job runs extractors on simplified html format, producing rdf/xml which is written onto another file. The rdf/xml for each harvested document is written as its own logical record, perhaps one record per block.

concurrent batch load of rdf/xml into scale-out knowledge base. the input is a single file comprised of blocks, each of which is an rdf/xml file.


blocks

public Iterator<Long> blocks(String id,
                             int version)
Returns an iterator that visits all block identifiers for the file version in sequence.

Note: This may be used to efficiently distribute blocks among a population of clients, e.g., in a map/reduce paradigm.


copyBlocks

public long copyBlocks(String fromId,
                       int fromVersion,
                       String toId,
                       int toVersion)
Copies blocks from one file version to another. The data in each block of the source file version is copied into a new block that is appended to the target file version. Empty blocks are copied. Partial blocks are NOT combined. The block identifiers are NOT preserved since atomic append is used to add blocks to the target file version.

Parameters:
fromId -
fromVersion -
toId -
toVersion -
Returns:
The #of blocks copied. FIXME This could be made more efficient by sending the copy operation to each index partition in turn. that would avoid having to copy the data first to the client and thence to the target index partition.

writeBlock

public boolean writeBlock(String id,
                          int version,
                          long block,
                          byte[] b,
                          int off,
                          int len)
Atomic write of a block for a file version.

Note: You can write any valid block identifier at any time. If the block exists then its data will be replaced.

Note: Writing blocks out of sequence can create "holes". Those holes may be filled by later writing the "missing" blocks. copyBlocks(String, int, String, int) will renumber the blocks and produce a dense sequence of blocks.

Note: Atomic append will always write the successor of the largest block identifier already written on the file version. If you write block MAX_BLOCK then it will no longer be possible to append blocks to that file version, but you can still write blocks using writeBlock(String, int, long, byte[], int, int).

Parameters:
id - The file identifier.
version - The file version.
block - The block identifier in [0:MAX_BLOCK].
b - The buffer containing the bytes to be written. When the buffer contains more than blockSize bytes it will be broken up into multiple blocks.
off - The offset of the 1st byte to be written.
len - The #of bytes to be written.
Returns:
true iff the block was overwritten (ie., if the block already exists, which case its contents were replaced).
Throws:
IllegalArgumentException - if id is null or an empty string.
IllegalArgumentException - if version is negative.
IllegalArgumentException - if block is negative.
IllegalArgumentException - if b is null.
IllegalArgumentException - if off is negative or greater than the length of the byte[].
IllegalArgumentException - if len is negative or off+len is greater than the length of the byte[].
IllegalArgumentException - if len is greater than blockSize.
TODO:
return the data for the old block instead in the case of an overwrite?

deleteHead

public long deleteHead(String id,
                       int version)
Atomic delete of the first block of the file version.

Parameters:
id - The file identifier.
version - The version identifier.
Returns:
The block identifier of the deleted block -or- -1L if nothing was deleted.

deleteBlock

public boolean deleteBlock(String id,
                           int version,
                           long block)
Atomic delete of a block for a file version.

Parameters:
id - The file identifier.
version - The version identifier.
block - The block identifier -or- -1L to read the first block in the file version regardless of its block identifier.
Returns:
true iff the block was deleted.

readHead

public byte[] readHead(String id,
                       int version)
Atomic read of the first block of the file version.

Parameters:
id - The file identifier.
version - The version identifier.
Returns:
The contents of the block -or- null iff there are no blocks for that file version. Note that an empty block will return an empty byte[] rather than null.

readBlock

public byte[] readBlock(String id,
                        int version,
                        long block)
Atomic read of a block for a file version.

Parameters:
id - The file identifier.
version - The version identifier.
block - The block identifier.
Returns:
The contents of the block -or- null iff the block does not exist. Note that an empty block will return an empty byte[] rather than null.
TODO:
offer a variant that returns an InputStream?

appendBlock

public long appendBlock(String id,
                        int version,
                        byte[] b,
                        int off,
                        int len)
Atomic append of a block to a file version.

Parameters:
id - The file identifier.
version - The file version.
b - The buffer containing the data to be written..
off - The offset of the 1st byte to be written.
len - The #of bytes to be written in [0:blockSize].
Returns:
The block identifer for the written block.
Throws:
IllegalArgumentException - if id is null or an empty string.
IllegalArgumentException - if version is negative.
IllegalArgumentException - if b is null.
IllegalArgumentException - if off is negative or greater than the length of the byte[].
IllegalArgumentException - if len is negative or off+len is greater than the length of the byte[].
IllegalArgumentException - if len is greater than blockSize.

getBlockCount

public long getBlockCount(String id,
                          int version)
Return the maximum #of blocks in the file version. The return value includes any deleted but not yet eradicated blocks for the specified file version, so it represents an upper bound on the #of blocks that could be read for that file version.

Note: the block count only decreases when a compacting merge eradicates deleted blocks from an index partition. It will increase any time there is a write on a block for the file version for which neither a delete nor an undeleted entry exists. The only way to count the #of non-deleted blocks for a file version is to traverse the blocks(String, int) iterator.

Parameters:
id - The file identifier.
version - The file version identifier.
Returns:
The #of blocks in that file.

writer

public Writer writer(String id,
                     int version,
                     String encoding)
              throws UnsupportedEncodingException
Return a Writer that will append character data on the file version. Characters written on the Writer will be converted to bytes using the specified encoding. Bytes will be buffered until the block is full and then written on the file version using an atomic append. An Writer.flush() will force a non-empty partial block to be written immediately.

Note: Map/Reduce processing of a file version MAY be facilitated greatly by ensuring that "records" never cross a block boundary - this means that file versions can be split into blocks and blocks distributed to clients without any regard for the record structure within those blocks. The caller can prevent records from crossing block boundaries by the simple expediency of invoking Writer.flush() to force the atomic append of a (partial but non-empty) block to the file.

Since the characters are being converted to bytes, the caller MUST make Writer.flush() decisions with an awareness of the expansion rate of the specified encoding. For simplicity, it is easy to specify UTF-16 in which case you can simply count two bytes written for each character written.

Parameters:
id - The file identifier.
version - The version identifier.
encoding - The character set encoding.
Returns:
The writer on which to write the character data.
Throws:
UnsupportedEncodingException

reader

public Reader reader(String id,
                     int version,
                     String encoding)
              throws UnsupportedEncodingException
Read character data from a file version.

Parameters:
id - The file identifier.
version - The version identifier.
encoding - The character set encoding.
Returns:
The reader from which you can read the character data.
Throws:
UnsupportedEncodingException

inputStream

public FileVersionInputStream inputStream(String id,
                                          int version)
Read data from a file version.

Note: The input stream will remain coherent for the file version as of the time that the view on the file version is formed. Additional atomic appends MAY be read, but that is NOT guarenteed. If the file is deleted and its data is expunged by a compacting merge during the read then the read MAY be truncated.

Parameters:
id - The file identifier.
version - The version identifier.
Returns:
An input stream from which the caller may read the data in the file -or- null if there is no data for that file version, including no deleted blocks pending garbage collection. An empty input stream MAY be returned since empty blocks are allowed. An empty stream will also be returned after a file version is deleted until the deleted blocks are eradicated from the file data index.

inputStream

public FileVersionInputStream inputStream(String id,
                                          int version,
                                          long tx)
Read data from a file version.

Some points about consistency and transaction identifiers.

  1. When using an ITx.UNISOLATED read addition atomic writes and atomic appends issued after the input stream view was formed MAY be read, but that is NOT guarenteed - it depends on the buffering of the range iterator used to read blocks for the file version. Likewise, if the file is deleted and its data is expunged by a compacting merge during the read then the read MAY be truncated.
  2. It is possible to re-create historical states of a file version corresponding to a commit point for the data index provided that the relevant data has not been eradicated by a compacting merge. It is not possible to recover all states - merely committed states - since unisolated writes may be grouped together by group commit and therefore have the same commit point.
  3. It is possible to issue transactional read requests, but you must first open a transaction with an ITransactionManagerService. In general the use of full transactions is discouraged as the BigdataFileSystem is designed for high throughput and high concurrency with weaker isolation levels suitable for scale-out processing techniques including map/reduce.

Parameters:
id - The file identifier.
version - The version identifier.
tx - The transaction identifier. This is generally either ITx.UNISOLATED to use an unisolated read -or- - timestamp to use a historical read for the most recent consistent state of the file data not later than timestamp.
Returns:
An input stream from which the caller may read the data in the file -or- null if there is no data for that file version, including no deleted blocks pending garbage collection. An empty input stream MAY be returned since empty blocks are allowed. An empty stream will also be returned after a file version is deleted until the deleted blocks are eradicated from the file data index.

outputStream

public OutputStream outputStream(String id,
                                 int version)
Return an output stream that will append on the file version. Bytes written on the output stream will be buffered until they are full blocks and then written on the file version using an atomic append. An OutputStream.flush() will force a non-empty partial block to be written immediately.

Note: Map/Reduce processing of a file version MAY be facilitated greatly by ensuring that "records" never cross a block boundary - this means that files can be split into blocks and blocks distributed to clients without any regard for the record structure within those blocks. The caller can prevent records from crossing block boundaries by the simple expediency of invoking OutputStream.flush() to force the atomic append of a (partial but non-empty) block to the file.

Parameters:
id - The file identifier.
version - The version identifier.
Returns:
The output stream.

copyStream

public long copyStream(String id,
                       int version,
                       InputStream is)
Copies data from the input stream to the file version. The data is buffered into blocks. Each block is written on the file version using an atomic append. Writing an empty stream will cause an empty block to be appended (this ensures that read back will read an empty stream).

Parameters:
id - The file identifier.
version - The version identifier.
is - The input stream (closed iff it is fully consumed).
Returns:
The #of bytes copied.


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.