Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Nutch & Solr

William Ortiz

2008-10-21

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi:

I am a newcomer to the Solr/Nutch community and I have some questions.

I was able to hook up Nutch for search and Solr for indexing, but I
would like to know how (if it is possible) to surface something similar
to the Nutch result summary in Solr. Should I store the value of the
'content' field in Solr and create the summary from it?

Also, Nutch fetches some links that return a 404 error, and these are
then indexed by Solr. Is there some way that I can filter these results
in the SolrIndexer class before they are indexed? Is it possible to get
either the Status, Metadata, Signature in the SolrIndexer?

The last few fields I mentioned can be seen when doing a dump of the
database and looking at the results...

http://xxxx.xxxx..com/xxx/xxx-xxx   Version: 6
Status: 1 (db_unfetched)
Fetch time: Tue Oct 21 10:45:36 EDT 2008
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 1
Retry interval: 2592000 seconds (30 days)
Score: 7.0573883E-6
Signature: null
Metadata: _pst_:blocked(23), lastModified=0

http://xxxx.xxxx..com/xxx/xxx-xxx   Version: 6
Status: 3 (db_gone)
Fetch time: Fri Dec 05 09:04:13 EST 2008
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 0
Retry interval: 3888000 seconds (45 days)
Score: 6.4350065E-4
Signature: null
Metadata: _pst_:notfound(14), lastModified=0:
http://xxxx.xxxx..com/xxx/xxx-xxx

Thank you in advance for your help.

William J Ortiz
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.