Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

URLs not crawled in order (referring to URL list)

Mathias Conradt

2008-06-24

Replies: Find Java Web Hosting

Author LoginPost Reply

I created my URL list file from my Google sitemap with all URLs in it, and
then set the depth of the crawler to 1, so I don't want the crawler to
follow any sublinks.
When I look at the log, I found that the crawler doesn't follow the URL list
line by line, but randomly. Is there a reason why it doesn't do so?
Or do I actually have to set the depth to 0 instead of 1 ?

(Because the crawling process takes a while, I wanted to check by the log,
at which URL the crawler is at at the moment, but couldn't do it.)

--
Sent from the Nutch - User mailing list archive at Nabble.com.

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.