Java Mailing List Archive

http://www.java2.5341.com/

Home » java-user.lucene »

Extremely Large Strings Comparison (slightly off-topic)

Aaron Schon

2008-11-14

Replies: Find Java Web Hosting

Author LoginPost Reply
hi I need to compare two Base64 representation strings of some MIME content that I am storing within a Lucene index. I need to efficiently compare them to find the closest match to a query Base64 string , post Lucene query.

I am not sure of the best way to approach this, could I compare the hashes and compute their similarity? Levenshtein distance seems hard because of the size of ths strings and seems inefficient? Is there any other method you could suggest?

n.b: The idea is to not to determine exact match or not, it is to compute a similarity metric. for example

John & Johnson (closer)
vs,
John & Jimmy (farther)

tia,
Aaron


   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@(protected)
For additional commands, e-mail: java-user-help@(protected)

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.