Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Crawling a fixed domain

kranthi reddy

2008-06-26

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi ,

I am trying to crawl a fixed domain ... say IBNLIVE.COM ...

I have changed my conf/crawl-urlfilter.txt . I have added the line

"+^http://([a-z0-9]*\.)*ibnlive.com/ "


 But i dont wat is going on ... i get results like

"fetching http://www.google-analytics.com/urchin.js
 fetching http://www.josh18.com/showstory.php?id=236481
 fetching
http://www.cricketnext.com/news/gambhir-raina-make-merry-as-bowlers-struggle/32395-13.html
"


 I have given it in the format specified in the wiki/nutch site....
 But it doesn't seem to work...

Some one please help me out...

Thanking you
kranthi reddy.b
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.