Author Login
Post Reply
Hi,
Thanks for help.
I have already added this in plugin.includes.
and still getting only root url.
Regards,
Chetan Patel
Edward Quick wrote:
>
>
> Chetan,
>
> Try adding parse-rss in nutch-site.xml. Here's mine:
>
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> <description></description>
> </property>
>
>
> Ed.
>
>
>> Date: Sat, 27 Sep 2008 01:30:43 -0700
>> From: chetan@(protected)
>> To: nutch-user@(protected)
>> Subject: crawl xml url using nutch-0.9
>>
>>
>> Hi All,
>>
>> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) using
>> depth 2.
>>
>> But it will crawl only root url.
>>
>> Please help me how to crawl root url as well as all sub url of root url.
>>
>> Thanks in advance.
>>
>> Regads,
>> Chetan Patel
>> --
>> View this message in context:
>> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
> _________________________________________________________________
> Get all your favourite content with the slick new MSN Toolbar - FREE
> http://clk.atdmt.com/UKM/go/111354027/direct/01/
>
--
Sent from the Nutch - User mailing list archive at Nabble.com.