Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Re: Indexing static html files

Winton Davies

2008-07-07

Replies: Find Java Web Hosting

Author LoginPost Reply
I meant that you could just do a http://external_url.com/y/z/
crawl . But yes, if you have pages from someone elses server locally,
you will need to rewrite the BASE component of the URL in the search
results.

For that you could probably just hack search.jsp (but dont tell
anyone I told you to) to rewrite the URLs.
go to the ~tomcat/webapps/ROOT and edit search.jsp -- you'll need to
know some java to do that, but look for Hits and url, should be easy
enough to work out where to put the string replace.

Winton



>k, so you merge your other crawls into the same search dir, thats
>understood thanks.
>
>My other question is concerning when you do a search in nutch. Right now,
>it returns links to "file:///x/y/z/......./foo.html" and i was wondering if
>there was a simple way to change that link to be "
>http://mysite.com/y/z/...../foo.html" when nutch returns the data. Seems
>like you cant change it since its using the same link it used to crawl the
>data.
>
>>Not without modifying the code. I dont think it respects <BASE> for
>example, if you crawl it as File:///
>>Frankly if you can, just serve it thru DOCROOT - it will be less painful in
>the end!
>>
>>- Serving URL - You can change it if you know how to set up Tomcat.
>
>How do i serve it thru DOCROOT? is that in tomcat? And also, wont nutch
>still return links when i do a search in the form of:
>file:///x/y/z......foo.html ?   Thats the part in nutch im trying to
>change. Thanks.
>
>-Ryan
>
>On Sat, Jul 5, 2008 at 10:23 PM, Winton Davies <wdavies@(protected)>
>wrote:
>
>> oh sorry I misunderstood the question - I think you can only serve from 1
>> directory (aka Crawl by default). Of course you can create multiple
>> instances that serve from different crawls, but then you'd have to deal with
>> joining them together.
>>
>> You can definitely MERGE multiple crawl directories.
>>
>> W
>>

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.