Hey all,
We're experimenting with Nutch on a Hadoop cluster. Hadoop is version
0.17, launched using the Hadoop public EC2 AMI, using the instructions
here: http://wiki.apache.org/hadoop/AmazonEC2
When running Nutch, our method is to build a nutch.jar that leaves out
the Hadoop classes, based on the advice here:
http://www.mail-archive.com/nutch-user@(protected)
- we're doing this by modifying build.xml (I can post our version if
it will help)
The one part of the advice we skipped is that we are running a
mismatch of versions - Nutch is currently against Hadoop 0.16, but we
are using Hadoop 0.17 for its clean EC2 support. Our version of Nutch
is the most recent svn trunk (r659263)
We're getting
java.lang.AbstractMethodError on crawl - here's the
first error line:
java.lang.AbstractMethodError:
org.apache.nutch.crawl.Injector$InjectMapper.map(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V
And the full console output is here: http://pastie.caboo.se/202517
Question: is it worth pressing on with this version mismatch, or
should we fall back to Hadoop 0.16?
If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
close tickets / work to make this happen.
Thanks in advance for helping us find our bearings!
--
Chris Anderson
http://jchris.mfdz.com