Author Login
Post Reply
Hi,
I would like to use Nutch to crawl and index an intranet web site for
internal use. The site requires authentication, and stores the
credentials in a cookie. I've got a valid login and I have the cookie
saved, no problem. How do I tell Nutch to use it?
I did some research online before asking, but unfortunately I couldn't
find a step-by-step answer for a newbie like myself. I see there's an
http-client plugin that can support some authentication. Is that what
I should use for cookies? If so, how do I configure it?
Or is there something else I should be doing? If the documentation /
answer exists, sorry for the hassle and please just point me to it ;)
--
Thanks,
Yoav