Author Login
Post Reply
Ok, so you merge your other crawls into the same search dir, thats
understood thanks.
My other question is concerning when you do a search in nutch. Right now,
it returns links to "file:///x/y/z/......./foo.html" and i was wondering if
there was a simple way to change that link to be "
http://mysite.com/y/z/...../foo.html" when nutch returns the data. Seems
like you cant change it since its using the same link it used to crawl the
data.
>Not without modifying the code. I dont think it respects <BASE> for
example, if you crawl it as File:///
>Frankly if you can, just serve it thru DOCROOT - it will be less painful in
the end!
>
>- Serving URL - You can change it if you know how to set up Tomcat.
How do i serve it thru DOCROOT? is that in tomcat? And also, wont nutch
still return links when i do a search in the form of:
file:///x/y/z......foo.html ? Thats the part in nutch im trying to
change. Thanks.
-Ryan
On Sat, Jul 5, 2008 at 10:23 PM, Winton Davies <wdavies@(protected)>
wrote:
> oh sorry I misunderstood the question - I think you can only serve from 1
> directory (aka Crawl by default). Of course you can create multiple
> instances that serve from different crawls, but then you'd have to deal with
> joining them together.
>
> You can definitely MERGE multiple crawl directories.
>
> W
>