Author Login
Post Reply
Hi all,
I have two questions related to the Nutch/Lucene ranking.
1) Does anyone know how the posting lists (term -> doc1 doc2 doc3) from the
index are sorted?
It is used a TFxIDF value, the boost value or none to sort documents (doc1
doc2 doc3)? Does Lucene compute the ranking for all the documents in the
posting lists or only part?
2) Does anyone know how to add more ranking features to the ranking function
of Nutch (eg. Pagerank, BM25)?
The NutchSimilarity class that extends the DefaultSimilarity from Lucene is
insufficient to achieve this. It is only prepared to change the TFxIDF
function.
Thanks in advance.
--
Miguel Costa
HYPERLINK "http://xldb.fc.ul.pt/~mcosta/"http://xldb.fc.ul.pt/~mcosta/
FCCN-Fundação para a Computação Científica Nacional Av. do Brasil, n.º 101
1700-066 Lisboa
Tel.: +351 21 8440190
Fax: +351 218472167
HYPERLINK "outbind://2/www.fccn.pt"www.fccn.pt
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.14/1425 - Release Date: 09-05-2008
12:38