Java Mailing List Archive

http://www.java2.5341.com/

Home » java-user.lucene »

Search on tag / category / label / keyword ...

T. H. Lin

2008-10-27

Replies: Find Java Web Hosting

Author LoginPost Reply
I would like to search a collection of "keyword"s with lucene.

A Document has one or many keywords. The keywords appear only once in a
document. (tf = 1)
for example:
Document_1 : ( "aa" "bb" "cc"      )
Document_2 : (      "bb" "cc"      )
Document_3 : (           "cc" "dd" )
Document_4 : ( "aa"       "cc" "dd" )

I have a query from more terms with different boost. The coord(int overlap,
int maxOverlap) is turn off. i.e. always return 1.0.
query = "aa^0.1 bb^0.9 xx^0.1 yy^0.1 zz^0.1"
the query may contain many terms which do not appear in a Document. i.e.
"xx" "yy" and "zz" here.

Amd I got
3 hits
Document_2 : (      "bb" "cc"      ) : score : 0.75391763
Document_1 : ( "aa" "bb" "cc"      ) : score : 0.67014897
Document_4 : ( "aa"       "cc" "dd" ) : score : 0.0670149

[Question] is...why Document_2 better than Document_1 !?
Document_1 does match two terms; "aa" and "bb".
I want to emphasize the "match" and less care the "mismatch".
How should I modify Similarity to achieve that? (Document_1 should get
higher score!)

Is there any suggestion or example to implement such "keyword collection"
searching?

For the query above,
I actually use BooleanQuery with TermQuery. What else should I take care of?

/* ************************************************** */
BooleanQuery q = new BooleanQuery(true); // disable coord
TermQuery tq;
{
 tq = new TermQuery(new Term(field, "aa"));
 tq.setBoost(.1f);
 q.add(tq, BooleanClause.Occur.SHOULD);
}
{
 tq = new TermQuery(new Term(field, "bb"));
 tq.setBoost(.9f);
 q.add(tq, BooleanClause.Occur.SHOULD);
}
{
  tq = new TermQuery(new Term(field, "xx"));
  tq.setBoost(.1f);
  q.add(tq, BooleanClause.Occur.SHOULD);
}
....
Hits hits = isearcher.search(q);
/* ************************************************** */

Thanks

Lin
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.