Hello!
I don't get search results after crawling and need help, please.
I install in a FreeBSD Jail the Java Development Kit 1.5, tomcat-5.5,
apache-ant-1.7 and nutch-0.9
in the directories: /usr/local/diablo-jdk1.5.0 , /usr/local/tomcat5.5 and
/usr/local/nutch
i run:
setenv JAVA_HOME /usr/local/diablo-jdk1.5.0
sh bin/nutch inject crawl/crawldb urls
sh bin/nutch generate crawl/crawldb crawl/segments
sh bin/nutch fetch crawl/segments/20080622124704
sh bin/nutch updatedb crawl/crawldb crawl/segments/20080622124704
sh bin/nutch invertlinks crawl/linkdb -dir crawl/segments/20080622124704
sh bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
crawl/segments/20080622124704
then i run:
sh bin/nutch
org.apache.nutch.searcher.NutchBean wiki
Total hits: 0
I start the tomcat server from the rc.conf file of the jail.
The nutch.war file is in /usr/local/tomcat5.5/webapps/ROOT.war
What should be the value for searcher.dir?
What did I do else wrong?
the config file:
/jail/jvm/usr/local/tomcat5.5/webapps/search/WEB-INF/classes/nutch-site.xml
looks like:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>example.com</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
<property>
<name>http.agent.description</name>
<value>search agent</value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent name.
</description>
</property>
<property>
<name>http://search.example.com</name>
<value></value>
<description>A URL to advertise in the User-Agent header. This will
appear in parenthesis after the agent name. Custom dictates that this
should be a URL of a page explaining the purpose and behavior of this
crawler.
</description>
</property>
<property>
<name>info@(protected)>
<value></value>
<description>An email address to advertise in the HTTP 'From' request
header and User-Agent header. A good practice is to mangle this
address (e.g. 'info at example dot com') to avoid spamming.
</description>
</property>
<property>
<name>searcher.dir</name>
<value>/usr/local/nutch/crawl</value>
</property>
</configuration>
thx!
--
Sent from the Nutch - User mailing list archive at Nabble.com.