OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: Character corruption for Chinese (simple and traditional) and Korean texts

From: Jeroen Geilman (jeroenadaptr.nl)
Date: Wed Oct 06 2010 - 14:46:40 CDT


On 10/05/2010 08:48 AM, Sharma, Ashish wrote:
> Hi,
>
> I have a setup, where emails received by mail server(postfix) are taken on and the resulting email's body(html or plain text) and attachments are parsed to separate files and saved, for this I use javax mail api.
>

Suspect.

> The problem occurs for email body when it is in Chinese (simple and traditional) (charset GB2312, as per email header) or Korean (charset ks_c_5601-1987, as per email header),
>
> the resulting parsed email bodies show character corruption (the characters are displayed as '?').
>

The message is saved in the wrong format, or you're reading it with the
wrong character decoding.

> Also even if I am explicitly saving the charset to be the one as suggested by email header the problem remains same.
>

How do you know this ?
Do as suggested and bcc a copy to a real mailbox, then compare the
contents, _in the same text editor_

--
J.