Java Mailing List Archive

http://www.java2.5341.com/

Home » java-user.lucene »

custom tag scoring question

Robert Stewart

2008-10-08

Replies: Find Java Web Hosting

Author LoginPost Reply
We have a custom "tagger" application which identifies certain entities (such as companies, etc.) and applies a "relevance" value to each entity based upon overall relevance in some document.

Then we index these "tags" into Lucene index by storing them in an indexed field (same name, different values), for example "company=A, company=B, company=C",etc.

I know how to set the boost on each field according the relevance value from our tagging application. However, sorting does not seem to work properly, since according to documentation all boost values per document under fields of the same name are actually combined by multiplying together:

From http://lucene.apache.org/java/docs/scoring.html:

"For each field of a document, all boosts of that field (i.e. all boosts under the same field name in that doc) are multiplied."

So if I have two document, each with some entities:

Doc 1: A (100%), B (50%), C (25%)
Doc2: A(75%), D (50%)

Then query for A should return Doc1 ahead of Doc2. But seems like what happens is this:

Doc1 boost = 1.0 * 0.5 * 0.25 = 0.125
Doc2 boost = 0.75 * 0.50 = 0.375

Therefore query for A returns Doc2 ahead of Doc1.

Is there a way around this (besides creating a different field name for each tag)? Can I create custom similarity or scoring classes to handle this at query time somehow?

Thanks,
Bob
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.