Java Mailing List Archive

http://www.java2.5341.com/

Home » java-user.lucene »

Re: Single searcher vs Multi Searcher

Ganesh - yahoo

2008-10-07

Replies: Find Java Web Hosting

Author LoginPost Reply
Hello Anusham,

My intention is to shard the index after every 7 days (week). After 30 days,
(4th week) the first DB may get deleted. At any point of time i will be
maitaining 3 to 4 DB.

I want to know the pros and cons of the MultiSearcher or Index sharding
approach. Any web links would be helpful.

Regards
Ganesh

----- Original Message -----
From: "Anshum" <anshumg@(protected)>
To: <java-user@(protected)>
Sent: Tuesday, October 07, 2008 12:18 AM
Subject: Re: Single searcher vs Multi Searcher


> Hi Ganesh,
> About the memory consumption while sorting, it would end up using similar
> amounts, perhaps even more.. like in the case of regular parallel
> programming algorithms (hoping that you intend to search using a parallel
> multi searcher). Would you have to query particular indexes only for a
> particular search or would you be searching over all the indexes and then
> follow it up by merger (which the parallel multi searcher would do
> efficiently).?
> Also, I guess 30 indexes would be a little too many, haven't really tried
> out those many indexes for a multisearcher.
> As far as maintenance of DB is concerned, it might be easy as long as you
> don't have any document updates, in which case you'd have to shift the
> documents from one DB/index to another (which includes creating an entry
> in
> the latest index/DB and deleting the record from the older DB).
> I guess you'd have to pilot it, in case memory is an issue in your case
> and
> not speed, you could try a regular multisearcher instead of a parallel
> multisearcher.
> I guess when you say maintenance of the DB gets easier, you mean that the
> data in each individual table is controlled (but remember there could be
> other bigger hassles like the one mentioned above about moving data
> between
> indexes/DB).
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Mon, Oct 6, 2008 at 10:06 AM, Ganesh <emailgane@(protected):
>
>> Hello Anshum,
>>
>> My index is growing 1 million documents per day. Initially i planned to
>> have a single database but the sorting of one or more fields consumes
>> more
>> RAM. Whether sharding the index would also consume the same.
>>
>> My application should co-exist with other application of my product and
>> my
>> app could get 1 GB of RAM. Search speed is fine but i need to display the
>> result in the sorted order.
>>
>> I thought to keep 7 days of documents in one index and create one more
>> after the 7 days. After 30 days the first index may get deleted. I need
>> to
>> keep the documents in the index DB for 30 days. My Index DB is in HDD.
>>
>> I want to the pros and cons of sharding. I think maintance of the DB
>> becomes easier.
>>
>> It would be very much helpful, if you share some of your thoughts.
>>
>> Regards
>> Ganesh
>>
>>
>> ----- Original Message ----- From: "Anshum" <anshumg@(protected)>
>> To: <java-user@(protected)>
>> Sent: Friday, October 03, 2008 9:48 PM
>> Subject: Re: Single searcher vs Multi Searcher
>>
>>
>>
>> Hi Ganesh,
>>>
>>> I have experimented with sharded indexes and they seem to benefit
>>> me(atleast
>>> in my case). I would like to know a few things before I answer your
>>> question:
>>> 1. Do you have a reasonable criteria ( a calculated one) to shard the
>>> indexes?
>>> 2. How do you plan to split the index? Is it going to be document based
>>> (which I guess it should be as otherwise you would have to build a
>>> complete
>>> distributed system)
>>> 3. Do you plan to put your indexes on the RAM or on (physically)
>>> seperate
>>> HDDs?
>>>
>>> Though all said and done, sharded indexes are a good approach, if done
>>> the
>>> right way.
>>> --
>>> Anshum Gupta
>>> Naukri Labs!
>>> http://ai-cafe.blogspot.com
>>>
>>> The facts expressed here belong to everybody, the opinions to me. The
>>> distinction is yours to draw............
>>>
>>>
>>> On Fri, Oct 3, 2008 at 3:01 PM, Ganesh <emailgane@(protected):
>>>
>>> Hello all,
>>>>
>>>> My indexing is growing by 1 million records per day and the memory
>>>> consumption of the searcher object is quite high.
>>>>
>>>> There are different opinion in the groups. Few suggest to use single
>>>> database and few to use sharding. My Database has 10 million records
>>>> now
>>>> and
>>>> it might go till 30 million or more. I plan to shard the index. but
>>>> Multisearcher will give me benifit.
>>>>
>>>> Regards
>>>> Ganesh
>>>>
>>>>
>>>> Send instant messages to your online friends
>>>> http://in.messenger.yahoo.com
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@(protected)
>>>> For additional commands, e-mail: java-user-help@(protected)
>>>>
>>>>
>>>>
>>>
>> Send instant messages to your online friends
>> http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@(protected)
>> For additional commands, e-mail: java-user-help@(protected)
>>
>>
>

Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@(protected)
For additional commands, e-mail: java-user-help@(protected)

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.