Why is this Lucene query a “contains” instead of a “startsWith”

lucene.netstartswith

string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");

will result in query being a prefixQuery :company:a*

Still I will get results like "Fleet Africa" where it is rather obvious that the A is not at the start and thus gives me undesired results.

Query query = new TermQuery(new Term("company", q+"*"));

will result in query being a termQuery :company:a* and not returning any results. Probably because it interprets the query as an exact match and none of my values are the "a*" literal.

Query query = new WildcardQuery(new Term("company", q+"*"));

will return the same results as the prefixquery;

What am I doing wrong?

Best Answer

StandardAnalyzer will tokenize "Fleet Africa" into "fleet" and "africa". Your a* search will match the later term.

If you want to consider "Fleet Africa" as one single term, use an analyzer that does not break up your string on whitespaces. KeywordAnalyzer is an example, but you may still want to lowercase your data so queries are case insensitive.

Related Topic