Author Login
Post Reply
I obtained some Chinese language webpages via "nutch fetch". But some
Chinese characters do not come out right after I dumped the segment back to
html pages. For instance:
http://www.dianping.com/shop/501079/
has title portion:
<head><title>
韶山冲(徐汇店)(图)_上海_大众点评网
</title>
However, I got this after dumping:
<head><title>
韶山�1¤7(徐汇庄1¤7)(�1¤7)_上海_大众点评罄1¤7
</title>
The charset specified in the page is "UTF-8". As I includeded the following
in "nutch-site.xml"
<name>parser.character.encoding.default</name>
<value>UTF-8</value>
It makes no difference.
What could be the problem?
[image: 回复时引用此帖] <newreply.php?do=newreply&p=5869>