Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

rtf parser status

olivier_coface

2008-10-29


Author LoginPost Reply

Hello,
what is the current status of the RTF parser? I saw that there was a license
problem and was unable to find the code of the RTFParser.

When I crawl on rtf files, I almost always have the following error:
Error parsing: xxx.rtf : failed(2,0): Can't be handled as Microsoft
document. java.io.IOException: Invalid header signature; read
7015536635646467195, expected -2226271756974174256

This error was also pointed by V. Shridar in one mail but unfortunately,
there was no response.

Should I definitely give up indexing rtf?
--
Sent from the Nutch - User mailing list archive at Nabble.com.

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.