Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

posting lists of index are sorted?

Miguel Costa

2008-05-12

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi all,

I have two questions related to the Nutch/Lucene ranking.

1) Does anyone know how the posting lists (term -> doc1 doc2 doc3) from the
index are sorted?
It is used a TFxIDF value, the boost value or none to sort documents (doc1
doc2 doc3)? Does Lucene compute the ranking for all the documents in the
posting lists or only part?

2) Does anyone know how to add more ranking features to the ranking function
of Nutch (eg. Pagerank, BM25)?
The NutchSimilarity class that extends the DefaultSimilarity from Lucene is
insufficient to achieve this. It is only prepared to change the TFxIDF
function.

Thanks in advance.

--

Miguel Costa

HYPERLINK "http://xldb.fc.ul.pt/~mcosta/"http://xldb.fc.ul.pt/~mcosta/



FCCN-Fundação para a Computação Científica Nacional Av. do Brasil, n.º 101

1700-066 Lisboa

Tel.: +351 21 8440190

Fax: +351 218472167

HYPERLINK "outbind://2/www.fccn.pt"www.fccn.pt



No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.14/1425 - Release Date: 09-05-2008
12:38

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.