Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

HTML meta tags in index

Michael Piccuirro

2008-07-09

Replies: Find Java Web Hosting

Author LoginPost Reply
I'm using nutch to crawl my site. I've successfully gone through the
tutorial and can search the index it creates. Now I want to be able to
include the meta tags from those pages in the documents in the index. I
would like the standard "description" and "keyword" tags as well as a couple
custom ones like "thumbnail" to be in my search results page.

So I've been doing a lot of RTFM'ing and the closest thing I can find is the
plugin example which demonstrates how to get a "recommended" meta tag and
increase the boost. So currently I'm prepared to write a plugin that reads
all the meta tags I need to use and add them to the index.

My question is, am I on the right track by building the plugin? Or is there
a easier out-of-the-box way to include the meta tag information?

Thanks a lot in advance for any help.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.