Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

individual crawl-urlfilter.txt and nutch-site.xml for different crawls?

Felix Zimmermann

2008-06-26

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,



I´d like to change the crawl-urlfilter.txt and nutch-site.xml depending on
the crawl. At the moment, I only use the “nutch crawl” command in a little
self-made .sh-script. In future, I´ll be in need of the other commands like
“nutch inject, fetch, …” too.



I think of something like “nutch crawl …. –urlfilter my_url_filter_file
–conffile my_nutch_site_xml_file”.



Am I right to make changes in the
org/apache/nutch/util/NutchConfiguration.java? If yes, how can I pass the
arguments?



If not, where do I have to modify the code to achieve this? I am not very
familiar with Java but I think I understand the code If I know where to go.



Thanks for every help!



Felix.



©2008 java2.5341.com - Jax Systems, LLC, U.S.A.