Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

problems: crawling specific domain

Mohammad Monirul Hoque

2008-09-03

Replies: Find Java Web Hosting

Author LoginPost Reply

Hi,

How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.

I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?

Regards.
--mohammad monirul hoque


   
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.