Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

svn nutch with hadoop 0.17

Chris Anderson

2008-05-23

Replies: Find Java Web Hosting

Author LoginPost Reply
Hey all,

We're experimenting with Nutch on a Hadoop cluster. Hadoop is version
0.17, launched using the Hadoop public EC2 AMI, using the instructions
here: http://wiki.apache.org/hadoop/AmazonEC2

When running Nutch, our method is to build a nutch.jar that leaves out
the Hadoop classes, based on the advice here:
http://www.mail-archive.com/nutch-user@(protected)
- we're doing this by modifying build.xml (I can post our version if
it will help)

The one part of the advice we skipped is that we are running a
mismatch of versions - Nutch is currently against Hadoop 0.16, but we
are using Hadoop 0.17 for its clean EC2 support. Our version of Nutch
is the most recent svn trunk (r659263)

We're getting java.lang.AbstractMethodError on crawl - here's the
first error line:

java.lang.AbstractMethodError:
org.apache.nutch.crawl.Injector$InjectMapper.map(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V

And the full console output is here: http://pastie.caboo.se/202517

Question: is it worth pressing on with this version mismatch, or
should we fall back to Hadoop 0.16?

If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
close tickets / work to make this happen.

Thanks in advance for helping us find our bearings!

--
Chris Anderson
http://jchris.mfdz.com
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.