Author Login
Post Reply
Chetan,
Try adding parse-rss in nutch-site.xml. Here's mine:
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
<description></description>
</property>
Ed.
> Date: Sat, 27 Sep 2008 01:30:43 -0700
> From: chetan@(protected)
> To: nutch-user@(protected)
> Subject: crawl xml url using nutch-0.9
>
>
> Hi All,
>
> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) using
> depth 2.
>
> But it will crawl only root url.
>
> Please help me how to crawl root url as well as all sub url of root url.
>
> Thanks in advance.
>
> Regads,
> Chetan Patel
> --
> View this message in context: http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/