Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

indexing url without parsed content

Edward Quick

2008-09-26


Author LoginPost Reply

I have a pdf document which nutch can't parse (despite the fact I applied the patch in http://www.mail-archive.com/nutch-dev@(protected)

Error parsing: http://www.somedomain.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/Leadership+-+Lost+at+sea/$FILE/Lost+at+sea.pdf: failed(2,0): Can't be handled as pdf document. java.lang.ClassCastException: org.pdfbox.pdmodel.encryption.PDEncryptionDictionary

Can I manually add a title to the index with this url ?

_________________________________________________________________
Win New York holidays with Kellogg’s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.