Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Dedup Question

Patrick Markiewicz

2008-07-23

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,

       If I have a url http://www.example.com/index.html stored in
my index with the content: EMPTY FILE, and I have a file
http://www.domain.com/index.html with the content: EMPTY FILE, then the
two files are duplicates. Which one will the de-duplication process
remove from the index? Thanks.



Patrick

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.