Java Mailing List Archive

http://www.java2.5341.com/

Home » java-user.lucene »

IndexSearcher and multi-threaded performance

Dmitri Bichko

2008-11-11

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,

I'm pretty new to Lucene, so please bear with me if this has been
covered before.

The wiki suggests sharing a single IndexSearcher between threads for
best performance
(http://wiki.apache.org/lucene-java/ImproveSearchingSpeed). I've
tested running the same set of queries with: multiple threads sharing
the same searcher, with a separate searcher for each thread, both
shared/private with a RAMDirectory in-memory index, and (just for fun)
in multiple JVMs running concurrently (the results are in milliseconds
to complete the whole job):

threads multi-jvm shared per-thread ram-shared ram-thread
   1    72997  70883     72573     60308     60012
   2    33147  48762     35973     25498     25734
   4    16229  46828     21267     13127     27164
   6    13088  47240     14028     9858     29917
   8     9775  47020     10983     8948     10440
  10     8721  50132     11334     9587     11355
  12     7290  49002     11798     9832
  16     9365  47099     12338     11296

The shared searcher indeed behaves better with a ram-based index, but
what's going on with the disk-based one? It's basically not scaling
beyond two threads. Am I just doing something completely wrong here?

The test consists of about 1,500 Boolean OR queries with 1-10
PhraseQueries each, with 1-20 Terms per PhraseQuery. I'm using a
HitCollector to count the hits, so I'm not retrieving any results.
The index is about 5GB and 20 million documents.

This is running on a 8 x quad-core Opteron machine with plenty of RAM to spare.

Any idea why I would see this behaviour?

Thanks,
Dmitri

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@(protected)
For additional commands, e-mail: java-user-help@(protected)

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.