Author Login
Post Reply
I grabbed the latest nightly build <nutch-2008-05-21_04-01-59.tar.gz>.
I set up the same old configuration on my cluster by copying files.
Then I fired this command on the master node:
bin/nutch crawl urls -dir crawled -depth 10
I still get this error message:
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawled
0 records? I have a list of 20 urls in my seed file.
I checked my crawl filter:
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*.*
The last time I faced this issue I just increased the depth from 3 to 10
and it worked but somewhere halfway, it died, so I re-ran the command
again and since then no matter what depth I put, it gets stuck over
there.
I saw someone speaking about this in NUTCH-503 and patching the source.
I grabbed a nightly build instead hoping that someone would have patched
this already.
Please help.
Thanks
--
Abhijit Bera
Associate Software Engineer - Web Enterprise Division
Geodesic Information Systems Ltd.
Please show concern for the environment. Print this e-mail only if
required.
I use Ubuntu Linux.
--Disclaimer--
This email and any files transmitted with it are confidential and
intended solely for the use of the entity to which they are addressed.
If you have received this email in error please notify the sender
immediately. Please note that any views presented in the email are
solely those of the author and do not necessarily represent those
of Geodesic.
While all care has been taken to avoid viruses the recipient is advised
to check this email and attachments for presence of viruses. Geodesic
accepts no liability on this account. Mails may be stored for monitoring
and review