com.bigdata.rdf.lexicon
Interface ITextIndexer<A extends IHit>

All Known Subinterfaces:
ISubjectCentricTextIndexer<A>, IValueCentricTextIndexer<A>
All Known Implementing Classes:
BigdataSubjectCentricFullTextIndex, BigdataValueCentricFullTextIndex

public interface ITextIndexer<A extends IHit>

Abstraction for the text indexer for RDF Values allowing either the built-in bigdata FullTextIndex or support for Lucene, etc.

Version:
$Id: ITextIndexer.java 6285 2012-04-13 15:12:13Z mrpersonick $
Author:
Bryan Thompson
See Also:
AbstractTripleStore.Options#TEXT_INDEXER_CLASS

Method Summary
 int count(String query, String languageCode, boolean prefixMatch, double minCosine, double maxCosine, int minRank, int maxRank, boolean matchAllTerms, boolean matchExact, long timeout, TimeUnit unit)
          Count free text search results.
 void create()
           
 void destroy()
           
 boolean getIndexDatatypeLiterals()
          Return true iff datatype literals are being indexed.
 Hiterator<A> search(String query, String languageCode, boolean prefixMatch, double minCosine, double maxCosine, int minRank, int maxRank, boolean matchAllTerms, boolean matchExact, long timeout, TimeUnit unit)
          Do free text search
 

Method Detail

create

void create()

destroy

void destroy()

getIndexDatatypeLiterals

boolean getIndexDatatypeLiterals()
Return true iff datatype literals are being indexed.


search

Hiterator<A> search(String query,
                    String languageCode,
                    boolean prefixMatch,
                    double minCosine,
                    double maxCosine,
                    int minRank,
                    int maxRank,
                    boolean matchAllTerms,
                    boolean matchExact,
                    long timeout,
                    TimeUnit unit)
Do free text search

Parameters:
query - The query (it will be parsed into tokens).
languageCode - The language code that should be used when tokenizing the query -or- null to use the default Locale ).
prefixMatch - When true, the matches will be on tokens which include the query tokens as a prefix. This includes exact matches as a special case when the prefix is the entire token, but it also allows longer matches. For example, free will be an exact match on free but a partial match on freedom. When false, only exact matches will be made.
minCosine - The minimum cosine that will be returned (in [0:maxCosine]). If you specify a minimum cosine of ZERO (0.0) you can drag in a lot of basically useless search results.
maxCosine - The maximum cosine that will be returned (in [minCosine:1.0]). Useful for evaluating in relevance ranges.
minRank - The min rank of the search result.
maxRank - The max rank of the search result.
matchAllTerms - if true, return only hits that match all search terms
matchExact - if true, return only hits that have an exact match of the search string
timeout - The timeout -or- ZERO (0) for NO timeout (this is equivalent to using Long.MAX_VALUE).
unit - The unit in which the timeout is expressed.
Returns:
The result set.

count

int count(String query,
          String languageCode,
          boolean prefixMatch,
          double minCosine,
          double maxCosine,
          int minRank,
          int maxRank,
          boolean matchAllTerms,
          boolean matchExact,
          long timeout,
          TimeUnit unit)
Count free text search results.

Parameters:
query - The query (it will be parsed into tokens).
languageCode - The language code that should be used when tokenizing the query -or- null to use the default Locale ).
prefixMatch - When true, the matches will be on tokens which include the query tokens as a prefix. This includes exact matches as a special case when the prefix is the entire token, but it also allows longer matches. For example, free will be an exact match on free but a partial match on freedom. When false, only exact matches will be made.
minCosine - The minimum cosine that will be returned (in [0:maxCosine]). If you specify a minimum cosine of ZERO (0.0) you can drag in a lot of basically useless search results.
maxCosine - The maximum cosine that will be returned (in [minCosine:1.0]). Useful for evaluating in relevance ranges.
minRank - The min rank of the search result.
maxRank - The max rank of the search result.
matchAllTerms - if true, return only hits that match all search terms
matchExact - if true, return only hits that have an exact match of the search string
timeout - The timeout -or- ZERO (0) for NO timeout (this is equivalent to using Long.MAX_VALUE).
unit - The unit in which the timeout is expressed.
Returns:
The result count.


Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.