  | |  | addIndexes() Question | addIndexes() Question 2004-12-22 - By Ryan Aslett
Back
Hi there, Im about to embark on a Lucene project of massive scale (between 500 million and 2 billion documents). I am currently working on parallellizing the construction of the Index(es).
Rough summary of my plan: I have many, many physical machines, each with multiple processors that I wish to dedicate to the construction of a single index. I plan on having each machine gather its documents from a central sychronized source (network, JMS, whatever). Within each machine I will have multiple threads each responsible for construcing an index slice.
When all machines and all threads are finished, I should have a slew of index slices that I want to combine together to create one index.
My question is this: Will it be more efficient to call addIndexes(Directory[] dirs) on all the slices all at once?
Or might it be better to continually merge small indexes into a larger index, i.e. once an index slice reaches a particular size, merge it into the main index and start building a new slice...
Any help would be appreciated..
Ryan Aslett
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------ To unsubscribe, e-mail: lucene-user-unsubscribe@(protected) For additional commands, e-mail: lucene-user-help@(protected)
Earn $52 per hosting referral at Lunarpages.
|
|
 |