Author Login
Post Reply
On Wed, Oct 1, 2008 at 4:35 PM, Yoav Shapira <yoavs@(protected):
> Hi,
>
> I would like to use Nutch to crawl and index an intranet web site for
> internal use. The site requires authentication, and stores the
> credentials in a cookie. I've got a valid login and I have the cookie
> saved, no problem. How do I tell Nutch to use it?
>
> I did some research online before asking, but unfortunately I couldn't
> find a step-by-step answer for a newbie like myself. I see there's an
> http-client plugin that can support some authentication. Is that what
> I should use for cookies? If so, how do I configure it?
>
> Or is there something else I should be doing? If the documentation /
> answer exists, sorry for the hassle and please just point me to it ;)
>
Unfortunately, nutch doesn't have such a feature yet. (One of the problems
is that we do not have a place to store cookies in a distributed setup)
> --
> Thanks,
>
> Yoav
>
--
Doğacan Güney