Stopwords in phrases 2004-12-21 - By Ravi
Back I want to be able to use stopwords in exact phrase searches. I have looked at Nutch and used the same approach (replace common words with n-grams. Look at net.nutch.analysis.CommonGrams). So if "to","be","or" and "not" are stop words, for the string "to be or not to be", the analyzer produces the following tokens
[to-be, to-be-or, to-be-or-not, to-be-or-not-to, to-be-or-not-to-be, be-or, be-or-not, be-or-not-to, be-or-not-to-be, or-not, or-not-to, or-not-to-be, not-to, not-to-be, to-be]
This is exactly what I wanted from the analyzer during indexing. But I'm having a problem with the search. when I do a search on "not to be" the analyzer is converting my search into content:"not-to not-to-be to-be" because the analyzer produces the tokens "not-to","not-to-be","to-be"
I'm getting 0 results on this as there is no token "not-to not-to-be to-be" in the index.
I want just "not-to-be" from the analyzer during the search so when I search on "not to be" I will get the document which has "not-to-be" as a token.
How can I use the same analyzer to get different results in indexing and searching?
Thanks in advance, Ravi.
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------ To unsubscribe, e-mail: lucene-user-unsubscribe@(protected) For additional commands, e-mail: lucene-user-help@(protected)
Earn $52 per hosting referral at Lunarpages.
|
|