com.bigdata.util
Class CSVReader

java.lang.Object
  extended by com.bigdata.util.CSVReader
All Implemented Interfaces:
Iterator<Map<String,Object>>

public class CSVReader
extends Object
implements Iterator<Map<String,Object>>

A helper class to read CSV (comma separated value) and similar kinds of delimited data. Files may use commas or tabs to delimit columns. If you have to parse other kinds of delimited data then you should override split(String).

Note: The default parsing of column values will provide Long integers and Double precision floating point values rather than Integer or Float. If you want to change this you need to customize the CSVReader.Header class since that is responsible for interpreting column values.

Note: If no headers are defined (by the caller) or read from the file (by the caller), then default headers named by the origin ONE column indices will be used.

Version:
$Id: CSVReader.java 2265 2009-10-26 12:51:06Z thompsonbry $
Author:
Bryan Thompson
TODO:
replace with flatpack? It uses an Apache 2 license.

Nested Class Summary
static class CSVReader.Header
          A header for a column that examines its values and interprets them as floating point numbers, integers, dates, or times when possible and as uninterpreted character data otherwise.
 
Field Summary
protected static int BUF_SIZE
          The #of characters to buffer in the reader.
protected  CSVReader.Header[] headers
          The header definitions (initially null).
protected static boolean INFO
           
protected static org.apache.log4j.Logger log
           
protected  BufferedReader r
          The source.
 
Constructor Summary
CSVReader(InputStream is, String charSet)
           
CSVReader(Reader r)
           
 
Method Summary
 CSVReader.Header[] getHeaders()
          Return the current headers (by reference).
 boolean getSkipBlankLines()
           
 boolean getSkipCommentLines()
           
 long getTailDelayMillis()
          The #of milliseconds that the CSVReader should wait before attempting to read another line from the source (when reading from a pipe) -or- 0L if the CSVReader should NOT continue reading once it has reached the end of the input (default 0L).
 boolean getTrimWhitespace()
           
 boolean hasNext()
           
 int lineNo()
          The current line number (origin one).
 Map<String,Object> next()
           
protected  Map<String,Object> parse(String[] values)
          Parse the line into column values.
protected  CSVReader.Header[] parseHeaders(String line)
          Parse a line containing headers.
 void readHeaders()
          Interpret the next row as containing headers.
 void remove()
          Unsupported operation.
protected  void setDefaultHeaders(int ncols)
          Creates default headers named by the origin ONE column indices {1,2,3,4,...}.
 void setHeader(int index, CSVReader.Header header)
          Re-define the CSVReader.Header at the specified index.
 void setHeaders(CSVReader.Header[] headers)
          Explictly set the headers.
 boolean setSkipBlankLines(boolean skipBlankLines)
           
 boolean setSkipCommentLines(boolean skipCommentLines)
           
 long setTailDelayMillis(long tailDelayMillis)
           
 boolean setTrimWhitespace(boolean trimWhitespace)
           
protected  String[] split(String line)
          Split the line into columns based on tabs or commas.
protected  String[] trim(String[] cols)
          Trim whitespace and optional quotes from each value iff getTrimWhitespace() is true.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log

INFO

protected static final boolean INFO

BUF_SIZE

protected static final int BUF_SIZE
The #of characters to buffer in the reader.

See Also:
Constant Field Values

r

protected final BufferedReader r
The source.


headers

protected CSVReader.Header[] headers
The header definitions (initially null).

See Also:
readHeaders(), #setHeaders(String[])
Constructor Detail

CSVReader

public CSVReader(InputStream is,
                 String charSet)
          throws IOException
Throws:
IOException

CSVReader

public CSVReader(Reader r)
          throws IOException
Throws:
IOException
Method Detail

lineNo

public int lineNo()
The current line number (origin one).


setSkipCommentLines

public boolean setSkipCommentLines(boolean skipCommentLines)

getSkipCommentLines

public boolean getSkipCommentLines()

setSkipBlankLines

public boolean setSkipBlankLines(boolean skipBlankLines)

getSkipBlankLines

public boolean getSkipBlankLines()

setTrimWhitespace

public boolean setTrimWhitespace(boolean trimWhitespace)

getTrimWhitespace

public boolean getTrimWhitespace()

getTailDelayMillis

public long getTailDelayMillis()
The #of milliseconds that the CSVReader should wait before attempting to read another line from the source (when reading from a pipe) -or- 0L if the CSVReader should NOT continue reading once it has reached the end of the input (default 0L).


setTailDelayMillis

public long setTailDelayMillis(long tailDelayMillis)

hasNext

public boolean hasNext()
Specified by:
hasNext in interface Iterator<Map<String,Object>>

next

public Map<String,Object> next()
Specified by:
next in interface Iterator<Map<String,Object>>

split

protected String[] split(String line)
Split the line into columns based on tabs or commas.

Parameters:
line - The line.
Returns:
The columns. There will be one value for each column identified in the line.
TODO:
allow quoted values that contain commas.

trim

protected String[] trim(String[] cols)
Trim whitespace and optional quotes from each value iff getTrimWhitespace() is true.

Parameters:
cols - The column values.
Returns:
The column values.

parse

protected Map<String,Object> parse(String[] values)
Parse the line into column values. If no headers have been defined then default headers are automatically using setDefaultHeaders(int).

Parameters:
line - The line.
Returns:
A map containing the parsed data.

setDefaultHeaders

protected void setDefaultHeaders(int ncols)
Creates default headers named by the origin ONE column indices {1,2,3,4,...}.

Parameters:
ncols - The #of columns.

parseHeaders

protected CSVReader.Header[] parseHeaders(String line)
Parse a line containing headers.

Parameters:
line - The line.
Returns:
The header definitions.

readHeaders

public void readHeaders()
                 throws IOException
Interpret the next row as containing headers.

Throws:
IOException

getHeaders

public CSVReader.Header[] getHeaders()
Return the current headers (by reference).


setHeaders

public void setHeaders(CSVReader.Header[] headers)
Explictly set the headers.

Parameters:
headers - The headers.

setHeader

public void setHeader(int index,
                      CSVReader.Header header)
Re-define the CSVReader.Header at the specified index.

Parameters:
index - The index in [0:#headers-1].
header - The new CSVReader.Header definition.

remove

public void remove()
Unsupported operation.

Specified by:
remove in interface Iterator<Map<String,Object>>


Copyright © 2006-2009 SYSTAP, LLC. All Rights Reserved.