Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

How to disable a subfolder from crawling?

plat hpc

2008-08-14


Author LoginPost Reply
Hi,

Basically I would like the crawl to include everything on
mydomain.comexcept for just 1 folder.

I had the crawl-urlfilter.txt to crawl

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*mydomain.com/

but how do i filter such that it doesn't crawl into this only folder at
http://mydomain.com/myfolder?

Thanks.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.