com.bigdata.service.mapred
Class AbstractMapTask

java.lang.Object
  extended by com.bigdata.service.mapred.AbstractMapTask
All Implemented Interfaces:
IMapTask, ITask, Serializable
Direct Known Subclasses:
AbstractFileInputMapTask

public abstract class AbstractMapTask
extends Object
implements IMapTask

Abstract base class for IMapTasks.

Note: The presumption is that there is a distinct instance of the map task for each task executed and that each task is executed within a single-threaded environment.

Note: Any declared fields are materialized on the master and the service, so make the field transient unless you need to send it to the service and do not initialize anything large on the master (unless it is transient). Lazy initialization is nice since we only do it on the service.

Version:
$Id: AbstractMapTask.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
See Also:
Serialized Form

Field Summary
protected  IHashFunction hashFunction
           
protected  int nreduce
           
protected  Object source
           
protected  UUID uuid
           
 
Fields inherited from interface com.bigdata.service.mapred.IMapTask
log
 
Constructor Summary
protected AbstractMapTask(UUID uuid, Object source, Integer nreduce, IHashFunction hashFunction)
           
 
Method Summary
protected  DataOutputBuffer getDataOutputBuffer()
          The values may be formatted using this utility class.
 int[] getHistogram()
          Return the histogram of the #of tuples in each output partition.
protected  IKeyBuilder getKeyBuilder()
          The KeyBuilder MUST be used by the IMapTask so that the generated keys will have a total ordering determined by their interpretation as an unsigned byte[].
 Object getSource()
          The source from which the map task will read its data.
 int getTupleCount()
          The #of tuples written by the task.
 com.bigdata.service.mapred.Tuple[] getTuples()
          Return the tuples.
 UUID getUUID()
          The unique identifier for the task.
 void output(byte[] val)
          Hash partitions the tuple based on the key already in keyBuilder into one of nreduce output buckets.
protected  void output(int partition, byte[] key, byte[] val)
          Output a key-value pair (tuple) to the appropriate reduce task.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

uuid

protected final UUID uuid

source

protected final Object source

nreduce

protected final int nreduce

hashFunction

protected final IHashFunction hashFunction
Constructor Detail

AbstractMapTask

protected AbstractMapTask(UUID uuid,
                          Object source,
                          Integer nreduce,
                          IHashFunction hashFunction)
Parameters:
uuid - The UUID of the map task. This MUST be the same UUID each time if a map task is re-executed for a given input. The UUID (together with the tuple counter) is used to generate a key that makes the map operation "retry safe". That is, the operation may be executed one or more times and the result will be the same. This guarentee arises because the values for identical keys are overwritten during the reduce operation.
source - The source from which the map task will read its data. This is commonly a File in a networked file system but other kinds of sources may be supported.
nreduce - The #of reduce tasks that are being feed by this map task.
hashFunction - The hash function used to hash partition the tuples generated by the map task into the input sink for each of the reduce tasks.
Method Detail

getKeyBuilder

protected IKeyBuilder getKeyBuilder()
The KeyBuilder MUST be used by the IMapTask so that the generated keys will have a total ordering determined by their interpretation as an unsigned byte[].

TODO:
does not always have to support unicode, could configure the buffer size for some tasks., could choose the collation sequence for unicode.

getDataOutputBuffer

protected DataOutputBuffer getDataOutputBuffer()
The values may be formatted using this utility class. The basic pattern is:
 valBuilder.reset().append(foo).toByteArray();
 


getUUID

public UUID getUUID()
Description copied from interface: ITask
The unique identifier for the task.

Note: if a task is retried then the new instance of that task MUST have the same identifier.

Specified by:
getUUID in interface ITask

getSource

public Object getSource()
The source from which the map task will read its data. This is commonly a File in a networked file system but other kinds of sources may be supported.


getTuples

public com.bigdata.service.mapred.Tuple[] getTuples()
Return the tuples.

Returns:

getTupleCount

public int getTupleCount()
The #of tuples written by the task.


output

public void output(byte[] val)
Hash partitions the tuple based on the key already in keyBuilder into one of nreduce output buckets. Forms a unique key using the data already in keyBuilder and appending the task UUID and the int32 tuple counter. Finally, invokes #output(byte[], byte[]) to output the key-value pair. The resulting key preserves the key order, groups all keys with the same value for the same map task, and finally distinguishes individual key-value pairs using the tuple counter.

Parameters:
val - The value for the tuple.
See Also:
output(int,byte[], byte[])

output

protected void output(int partition,
                      byte[] key,
                      byte[] val)
Output a key-value pair (tuple) to the appropriate reduce task. For example, the key could be a token and the value could be the #of times that the token was identified in the input. All tuples will be buffered until the map task completes successfully and the written onto the appropriate reduce partitions.

Parameters:
partition - The output partition in [0:nreduce}.
key - The complete key. The key MUST be encoded such that the keys may be placed into a total order by interpreting them as an unsigned byte[]. See KeyBuilder.
val - The value. The value encoding is essentially arbitrary but the DataOutputBuffer may be helpful here.

getHistogram

public int[] getHistogram()
Return the histogram of the #of tuples in each output partition.



Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.