Hi,
I did a crawl on <!-- URL SNIPPED -->
THE ERROR IS:
Error parsing: <!-- URL Snipped --> documentname.doc failed(2,0): Can't be handled as Microsoft document.
org.apache.nutch.parse.msword.FastSavedException: Fast-saved files are
unsupported at this time.
Wanted to see if there are any workarounds ... since around 40% of the documents
are giving this error.
Rgds,
Sridhar
Add more friends to your messenger and enjoy! Go to http://in.messenger.yahoo.com/invite/