Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

How do I crawl a site with a cookie for authentication?

Yoav Shapira

2008-10-01

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,

I would like to use Nutch to crawl and index an intranet web site for
internal use. The site requires authentication, and stores the
credentials in a cookie. I've got a valid login and I have the cookie
saved, no problem. How do I tell Nutch to use it?

I did some research online before asking, but unfortunately I couldn't
find a step-by-step answer for a newbie like myself. I see there's an
http-client plugin that can support some authentication. Is that what
I should use for cookies? If so, how do I configure it?

Or is there something else I should be doing? If the documentation /
answer exists, sorry for the hassle and please just point me to it ;)

--
Thanks,

Yoav
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.