Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

subcollection

Edward Quick

2008-09-30

Replies: Find Java Web Hosting

Author LoginPost Reply

Hi,

I'm trying to get subcollections working in nutch 1.0-dev, and have crawled our intranet with the subcollection.xml configured as below. However when I submit a query to search.jsp eg,

subcollection:im database

I don't get any results (as opposed to submitting this without subcollection:im)

Is this configured wrongly? I realise that subcollection.xml doesn't do regex expressions, but I wasn't sure if I could just put in part of the url, or had to put in the full stem pattern eg, http://planet.somdomain.com/level1/

Thanks,
Ed.

<subcollections>
    <subcollection>
          <name>default</name>
          <id>default</id>
          <whitelist>
          </whitelist>
          <blacklist>
            planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
            /aptprop.nsf/Content/Americas+
            /aptprop.nsf/Content/AB+CityFlyer+
            /aptprop.nsf/Content/CityFlyer+
            /im/barch/
            /im/dms/
            /im/tech/
          </blacklist>
    </subcollection>

    <subcollection>
          <name>im</name>
          <id>im</id>
          <whitelist>
            planet.somedomain.com/general/aptrix/aptim.nsf/
            planet.somedomain.com/im/barch/
            planet.somedomain.com/im/dms/
            planet.somedomain.com/im/tech/
          </whitelist>
          <blacklist />
    </subcollection>

    <subcollection>
          <name>news</name>
          <id>news</id>
          <whitelist>
            planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
          </whitelist>
          <blacklist />
    </subcollection>

</subcollections>

_________________________________________________________________
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.