Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

java.lang.StackOverflowError in HTMLMetaProcessor.getMetaTagsHelper

Siddhartha Reddy

2008-06-11

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi,

While parsing some pages, I am getting a java.lang.StackOverflowError
exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper. I'm
pasting part of the stack trace below. Unfortunately, I've logic that
deletes the segment if fetch/parse fails, so I do not know which particular
web page caused this problem; I'll recrawl the same pages with modified
logic (that does not delete the segment on failed parsing) and try to find
the offending URL.

Did anyone encounter such a problem before? Apart from increasing the stack
size for Java, is there any other possible solution?

java.lang.StackOverflowError
    at java.lang.Character.toUpperCase (Character.java:4278)
    at java.lang.String.regionMatches (String.java:1384)
    at java.lang.String.equalsIgnoreCase (String.java:1120)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:55)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:208)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:208)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:208)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:208)
    at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper (HTMLMetaProcessor.java:208)
    ....

Thanks,
Siddhartha
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.