Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

problem running nutch from eclipse 3.2 in ubuntu hardy.

Hemant Bist

2008-06-14

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,
I am trying to build and run nutch  from trunk in eclipse 3.2 in Ubuntu hardy. I am unable to get it to crawlany site after compiling it.  As far as I can tell, there is something wrong in my configuration but I can't figure out what it is!

I am following [http://wiki.apache.org/nutch/RunNutchInEclipse0.9]
and have included conf in .classpath. and modified nutch-defaults.xml for plugin.folders and http.agent.name


I get the final warning message as [complete hadoop.log is attached]
WARN  crawl.Crawl - No URLs to fetch - check your seed list and URL filters.
and
some of the earlier warning messages are
 WARN  mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2008-06-13 22:29:34,978 WARN  regex.RegexURLNormalizer - Can't load the default config file! /nutch/home/work/nutch/trunk/conf/regex-normalize.xml
2008-06-13 22:29:34,990 WARN  suffix.SuffixURLFilter - Missing urlfilter.suffix.file, all URLs will be rejected!
2008-06-13 22:29:34,994 FATAL api.RegexURLFilterBase - Can't find resource: crawl-urlfilter.txt
2008-06-13 22:29:34,995 FATAL api.RegexURLFilterBase - Can't find resource: automaton-urlfilte r.txt



I would appreciate any pointers in debugging this.

Thanks,
HB
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.