Author Login
Post Reply
Hi,
I'm pretty new to Lucene, so please bear with me if this has been
covered before.
The wiki suggests sharing a single IndexSearcher between threads for
best performance
(http://wiki.apache.org/lucene-java/ImproveSearchingSpeed). I've
tested running the same set of queries with: multiple threads sharing
the same searcher, with a separate searcher for each thread, both
shared/private with a RAMDirectory in-memory index, and (just for fun)
in multiple JVMs running concurrently (the results are in milliseconds
to complete the whole job):
threads multi-jvm shared per-thread ram-shared ram-thread
1 72997 70883 72573 60308 60012
2 33147 48762 35973 25498 25734
4 16229 46828 21267 13127 27164
6 13088 47240 14028 9858 29917
8 9775 47020 10983 8948 10440
10 8721 50132 11334 9587 11355
12 7290 49002 11798 9832
16 9365 47099 12338 11296
The shared searcher indeed behaves better with a ram-based index, but
what's going on with the disk-based one? It's basically not scaling
beyond two threads. Am I just doing something completely wrong here?
The test consists of about 1,500 Boolean OR queries with 1-10
PhraseQueries each, with 1-20 Terms per PhraseQuery. I'm using a
HitCollector to count the hits, so I'm not retrieving any results.
The index is about 5GB and 20 million documents.
This is running on a 8 x quad-core Opteron machine with plenty of RAM to spare.
Any idea why I would see this behaviour?
Thanks,
Dmitri
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@(protected)
For additional commands, e-mail: java-user-help@(protected)