Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Dedup

David Jashi

2008-09-18

Replies: Find Java Web Hosting

Author LoginPost Reply
Hello, colleagues.

I have a theoretical question - let's say
on 01/01/2008 we have crawled page http://www.site.com/page.html
on 10/01/2008 the page changed
on 01/02/2008 we crawled it once again and merged old and new indexes

which version of this page Nutch dedup will leave in index?

--
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
David@(protected)

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს ონლაინი"
+995(32)970368
David@(protected)
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.