Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

nutch fetched but no indexed

宫照

2008-07-23

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi everybody,

I face a problem when using nutch. I use nuth to crawl in intranet. It works
well before. But recently, I add some urls to crawl. These urls ara
different with normal .The new urls like this:
http://compass.mydomain.com/go/247460034

there are many folders or documents under this url, such as folder:
http://compass.mot.com/go/247460034/2354342276
documents:
http://compass.mot.com/go/247460034/mydoc.pdf

After crawl, the docs under this kind of urls can not be searched,
I check the log, I find when crawling this kind of urls can be fetched ,but
they were not indexed.

I don't know why. Can you tell how to do?

regards,

Gong Zhao
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.