Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Crawling password protected pages in NUTCH...

Rout Biswajit-B16078

2008-09-15

Replies: Find Java Web Hosting

Author LoginPost Reply

Hi,

I have successfully configured NUTCH 0.9, which is crawling number of sites and after that searching is also happening properly.

However, now I want to crawl password protected pages using NUTCH. In order to access those pages I should have a valid user name and password. I have configured the user name and password in my nutch-site.xml and httpclient-auth.xml

However it is not crawling.

I have attached nutch-site.xml, httpclient-auth.xml and hadoop.log in the Zip file for your reference. Kindly check and let me know what is missing from my end.

CONFIGURATION:

nutch-2008-07-10_04-01-48.tar (I have download from http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ which contains your patch for HttpAuthentication)

Windows XP, Cygwin, jdk1.6.0

Thanks in advance…

Please help....

Best regards,

Biswajit


Attachment: Nutch.zip (zipped)
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.