Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

search

Edward Quick

2008-09-16


Author LoginPost Reply

Hi,

I wondered if the config files in the nutch webapp (ie WEB-INF/classes) such as nutch-site.xml and crawl-urlfilter.txt get used by the webapp for searching?
Reason is when I search on something I get back the following urls:


http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=3
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=4
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=5
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=2
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=1

which effectively are all the same page, so although I want the crawl to parse these, I was the webapp search to only return the url up to the query,eg:

http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5

Hope that makes sense.

Thanks for any help,

Ed.

_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.