Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

modifiying a core class (Content.java) using plugins?

Onur Deniz

2008-09-16

Replies: Find Java Web Hosting

Author LoginPost Reply
is it possible?

well, in eclipse it succeeded. i added some encoding code in Content.java using HtmlParser (a plugin). it workes succesfully in eclipse (I have tested using SegmentReader only, not any unit tests though).

but when compiling using ant I get compile errors.


here is the modification in Content.java in nutch-0.9.tar.gz release version (not trunk)
I have replaced the line:
 buffer.append(new String(content)); // try default encoding
with
    Configuration conf = NutchConfiguration.create();
    HtmlParser parser = new HtmlParser();
    parser.setConf(conf);
    Parse parse = parser.getParse( this );
    String encoding=parse.getData().getParseMeta().get("OriginalCharEncoding");
   String localEncodedString="java incompatible encoding";
   try{
     localEncodedString = new String(content,encoding);
   }
   catch(Exception e){
     e.printStackTrace();
   }
   buffer.append(localEncodedString);

here is the compile errors;
compile-core:
  [javac] Compiling 165 source files to /home/onur/nutch-0.9/build/classes
  [javac] /home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:39: package org.apache.nutch.parse.html does not exist
  [javac] import org.apache.nutch.parse.html.HtmlParser;
  [javac]                       ^
  [javac] /home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:240: cannot find symbol
  [javac] symbol : class HtmlParser
  [javac] location: class org.apache.nutch.protocol.Content
  [javac]     HtmlParser parser = new HtmlParser();
  [javac]     ^
  [javac] /home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:240: cannot find symbol
  [javac] symbol : class HtmlParser
  [javac] location: class org.apache.nutch.protocol.Content
  [javac]     HtmlParser parser = new HtmlParser();
  [javac]                     ^
  [javac] Note: Some input files use or override a deprecated API.
  [javac] Note: Recompile with -Xlint:deprecation for details.
  [javac] Note: Some input files use unchecked or unsafe operations.
  [javac] Note: Recompile with -Xlint:unchecked for details.
  [javac] 3 errors

BUILD FAILED
/home/onur/nutch-0.9/build.xml:106: Compile failed; see the compiler error output for details.


do I need to make any other configuration to fix it? (parse-html exists in nutch-default.xml plugin.includes property, i tried also adding it in nutch-site.xml, but did not work)
or it is not intended to use plugins in core code?

any ideas?

(by the way what I'm trying to do here is to enable encoding in -get functionality.. it normally gives content in platform-default encoding (utf-8) )

thanks


onur deniz



   
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.