Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Repost: RegEx problem

ajaxtrend

2008-10-22

Replies: Find Java Web Hosting

Author LoginPost Reply
Its really bugging me from last two days as regex in crawl-urlfilter.txt does not recognize the urls as expected.

URLS
http://yyy.www.com/
http://yyy.www.com/used-cars/902/1/
http://yyy.www.com/used-cars/902/2/
http://yyy.www.com/ID_101360033/Honda-city-GXI-2005-for-sale.html

Pettern
+^http://yyy\\.www\\.com/(used-cars|ID\\w*)/((\\w*/(\\w*/)?)|(.*\\.html))

return pattern.matcher(url).find(); should return true for following URLs but it returns false. Not sure why???
I captured all URLs in a test file and test case correctly recognizes above urls i.e. pattern.matcher(url).find() returns true when it finds any of above URLs.

I understand that this maybe a easy question but it really had taken three days and I am still banging my head.

Appreciate your help on this.

- RB





   
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.