Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

searching into specific location

cristina

2008-08-27


Author LoginPost Reply
Good mornig!

I'm newbie using nutch.

I indexed my web site with the command:

/APPS2/nutch-0.9/bin/nutch crawl /APPS2/nutch-0.9/urls/urls.txt -dir
/APPS2/nutch-0.9/crawl.db -depth 5 -threads 10
and everything seemed to be ok.

In my crawl-urlfilter.xml file, I have:

-^(file|ftp|mailto|https):

-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|wmv|WMV|ra|RA|ram|RAM|css)$


# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*(unirioja.es|otrodominio.org)/


# skip everything else
-.

If I search through the nutch web application, every thing is OK.
I use site:http://www.unirioja.es or site:http://www.otrodominio.org to
change the site to be searched (boths domains are virtual hosts in my
apache)

Now I want to search only from http://www.unirioja.es/proyecto (only
search pages includes in this directory) but If I use in the query
string url:http://www.unirioja.es/proyecto (in the serach.jsp page), it
doesn´t work. No results are returned.
What´s wrong? Could anyone help me?
Thanks in advance!

Cristina

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.