Tuesday, August 4, 2009

"Mapped" RDF data loader

We've just introduced a new "mapped" data loader. The previous data loader assumed that the files were located on specific hosts in a known directory. This was designed map/reduce task in mind, where the RDF files were being dumped into the local directory by the reduce task.



With the mapped data loader you specify a scanner which is executed by the master. The default scanner knows how to identify files to be processed in file system, but it would be easy enough to write a scanner that consumed an HDFS block structured file whose contents were RDF data. This is more like the "map" stage of a map/reduce job, which is why we call it the "mapped" data loader. Regardless of how the scanner identifies the resources to be processed, the master "maps" those resources across client tasks running on the cluster.



Let us know if you are interested in loading data from HDFS or a hadoop map/reduce jobs into a massive distributed semantic web graph and we can help you work through the integration glue.

1 Comments:

Blogger True Light said...

Hi Bryan,

This is exactly what I am trying to do now. I'd love to know what your ideas are. Do mail me - satyaprakash at pramati dot com

August 25, 2009 8:40 PM  

Post a Comment

<< Home