Author Login
Post Reply
Hi folks,
I am unable to crawl all the links in my website. For some reason, only one
or two links are picked up by nutch.
Here is the website I am trying to index: http://www.knowmydestination.com
All links a this website are internal.
My crawl-urlfilter does not block any kind of internal links. It looks as
possible.
# accept hosts in MY.DOMAIN.NAME
+^http://www.knowmydestination.com/
# skip everything else
-.
My urls are: http://www.knowmydestination.com/
When I run:
bin/nutch crawl urls -dir crawl.kmd -depth 3 -topN 100
nutch only crwal one link
http://www.knowmydestination.com/articles/cheapfares.html
Can anyone help me figre this out.
/Amitab