Mailing List
Home
Forum Home
Maven - Project building tool
Axis - Java SOAP implementation
Lucene - Full-featured text search engine APIs
Cocoon - MVC web framework based on XML/XSL
Fop - Create PDF, PCL, PS, SVG, XML driven by XSL formatting objects.
Log4J - A log library
POI - Java Excel, Word and other Microsoft Office files manipulating library
Oracle database error code ...
Subjects
log4j warning: No appenders could be found
java security AccessControlException: access denied (java io FilePermission clie
java lang InstantiationException: org apache tools ant Main
Apache Axis Tutorial
Subject: Struts <logic iterate >
log4j properties How to parse outpu to multiple files
configuring log4j with BEA Weblogic 8 1
How to use XSL FOP Java together
JSP precompile
Proposal: Adding jar manifest classpath in jar and war plugins
Servlet File Download dialog problem (IE6,Adobe 6 0)
java security AccessControlException: access denied (java io FilePermission
Unsupported major minor version 48 0 problem while running the an
   telope task
Subject: axis wsdl2java Ant Task usage
net sf hibernate MappingException: Error reading resource: test/User hbm xml
Building EAR ANT Script for websphere 5 0
CREATING WAR Files
Classpath problem
jsp data into Excel
Jboss 3 2 3+ vs Tomcat Axis Question
RE: How to include jars and add them into the MANIFEST MF/Class Path
attribute
Printing problem
Subject: InstantiationException
Couldn 't find trusted certificate
Please : How can one install ant 1 6 0 under Eclipse 2 1 ?
Excel: Too many different cell formats
Subject: AXIS: tomcat timeout ?
1 3 final: now giving me java io FileNotFoundException (Too many
open files)
XDoclet, Struts and Maven: Where to start? SOLUTION
Subject: Running junit tests fails
 
character encoding and charsets

character encoding and charsets

2007-05-03       - By Justin Warren

 Back
Reply:     1     2  

Hi guys..



I have an interesting problem. I am using POI to extract text from a
word doc. (word 2000/03 usually). But the document is written in
Chinese. So naturally, when I write the extracted text to a plaintext
file, I get random ascii characters. So, I want to be able to decode the
charset into UTF-8 (See http://UTF-8.ora-code.com). Is there any way to determine the charset so I can
decode it?



In eclipse, I am doing a WordExtractor.getParagraphs() and if I set a
breakpoint, I can see the Chinese characters. Also, I noticed that there
is a property in HWPFDocument called field_27_cChFtnEdn. Is that
possibly what I should be looking at?



Thanks