Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Segment size and maintenance

John Martyniak

2008-10-30


Author LoginPost Reply
Hello everyone,

I am building a big index. And I have many segments, and I have a
couple of questions regarding segment/index maintenance.

Is there a practical limit to the size of segments? Right now I have
several segment with around 50K links and several with 25K links so
the total # of URLS is around 600K. Is that too big too merge? Do I
want to merge them.

How can I prune out old unwanted URLs? Is the best way to update
regex-urlfilters.txt and "mergesegs" with the filter option? Is there
a better way.

thank you in advance for any advice.

-John


©2008 java2.5341.com - Jax Systems, LLC, U.S.A.