Author Login
Post Reply
Hi,
I'm trying to get subcollections working in nutch 1.0-dev, and have crawled our intranet with the subcollection.xml configured as below. However when I submit a query to search.jsp eg,
subcollection:im database
I don't get any results (as opposed to submitting this without subcollection:im)
Is this configured wrongly? I realise that subcollection.xml doesn't do regex expressions, but I wasn't sure if I could just put in part of the url, or had to put in the full stem pattern eg, http://planet.somdomain.com/level1/
Thanks,
Ed.
<subcollections>
<subcollection>
<name>default</name>
<id>default</id>
<whitelist>
</whitelist>
<blacklist>
planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
/aptprop.nsf/Content/Americas+
/aptprop.nsf/Content/AB+CityFlyer+
/aptprop.nsf/Content/CityFlyer+
/im/barch/
/im/dms/
/im/tech/
</blacklist>
</subcollection>
<subcollection>
<name>im</name>
<id>im</id>
<whitelist>
planet.somedomain.com/general/aptrix/aptim.nsf/
planet.somedomain.com/im/barch/
planet.somedomain.com/im/dms/
planet.somedomain.com/im/tech/
</whitelist>
<blacklist />
</subcollection>
<subcollection>
<name>news</name>
<id>news</id>
<whitelist>
planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
</whitelist>
<blacklist />
</subcollection>
</subcollections>
_________________________________________________________________
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/