Author Login
Post Reply
I
Sent from my iPhone
On Oct 1, 2008, at 9:55 PM, "Alexander Aristov" <alexander.aristov@(protected)
> wrote:
> Nothing special should be done with S3 as Hadoop will take care
> about it
> itself. You must only be sure that the bucket is accessible. Don't
> worry
> about the tmp folder.
>
> I configured and run hadoop with S3 recently, it was Ok though I
> considered
> to switch back to hdfs and use S3 only as a final file store(backup)
> because
> I saw some productivity downgrade.
>
> Alexander
>
> 2008/10/1 Kevin MacDonald <kevin@(protected)>
>
>> The problem is not that I want to mix file systems. I would like to
>> start
>> from scratch with Hadoop configured to use S3. But to do that I
>> think that
>> various Paths which Nutch expects to be there have to be represented
>> somehow
>> in the S3 bucket that Hadoop is using. So, when Nutch tries to write
>> something to new Path("/tmp/...") Hadoop is able to do it. I think my
>> problem can be boiled down to this: How do you prepare an S3 bucket
>> so that
>> from Hadoop's perspective it looks like a hierarchical file system?
>> Kevin
>>
>> On Wed, Oct 1, 2008 at 12:28 AM, Doğacan Güney <dogacan@(protected)>
>> wrote:
>>
>>> On Tue, Sep 30, 2008 at 11:52 PM, Kevin MacDonald <kevin@(protected)
>>>
>>> wrote:
>>>> Does anyone have experience configuring Hadoop to use S3 for using
>> nutch?
>>> I
>>>> tried modifying my hadoop-site.xml configuration file and it
>>>> looks like
>>>> Hadoop is trying to use S3. But I think what's happening is that,
>>>> once
>>>> configured to use S3, Hadoop is ONLY looking at S3 for all files.
>>>> It's
>>>> trying to find a /tmp folder there, for example. And when running a
>> crawl
>>>> Hadoop is looking to S3 to find the seed urls folder. Are there
>>>> steps
>>> that
>>>> need to happen to prepare an S3 bucket for use by Hadoop so that a
>> nutch
>>>> crawl can happen?
>>>
>>> If you want to pass paths from other filesystems I think you can do
>>> something like:
>>>
>>> bin/nutch inject crawl/crawldb hdfs://machine:10000/.....
>>>
>>>> Kevin
>>>>
>>>
>>>
>>>
>>> --
>>> Doğacan Güney
>>>
>>
>
>
>
> --
> Best Regards
> Alexander Aristov