Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

How to crawl pdf?

plat hpc

2008-06-10

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,

I have followed the tutorial to setup my Nutch, up and running. Currently it
is able to crawl php files, but not the pdf files.

Can anyone please advise how can I setup or configure to make it crawl onto
pdf and word docs?

Thanks.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.