Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Re: Using S3 with Hadoop/Nutch

Doğacan Güney

2008-10-01

Replies: Find Java Web Hosting

Author LoginPost Reply
On Tue, Sep 30, 2008 at 11:52 PM, Kevin MacDonald <kevin@(protected):
> Does anyone have experience configuring Hadoop to use S3 for using nutch? I
> tried modifying my hadoop-site.xml configuration file and it looks like
> Hadoop is trying to use S3. But I think what's happening is that, once
> configured to use S3, Hadoop is ONLY looking at S3 for all files. It's
> trying to find a /tmp folder there, for example. And when running a crawl
> Hadoop is looking to S3 to find the seed urls folder. Are there steps that
> need to happen to prepare an S3 bucket for use by Hadoop so that a nutch
> crawl can happen?

If you want to pass paths from other filesystems I think you can do
something like:

bin/nutch inject crawl/crawldb hdfs://machine:10000/.....

> Kevin
>



--
Doğacan Güney
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.