Tuesday, July 28, 2009

How do I convert s-jis characters to utf-8 in C, C++ on linux?

I have a input sequence of bytes in a binary file. If I rename this as a htm and set encoding to "sjis" I see correct japanese characters.


I want output sequence of bytes which when copy pasted to htm and encoding set to "utf8", should show me the same japanese characters.


How do I manage this in a Cpp program written on Linux?


Please please and thanks a million for providing code/ pseudocode.

How do I convert s-jis characters to utf-8 in C, C++ on linux?
The easiest way to do your conversion is to use iconv. You can either pre-process the file. First, use "iconv -l" to list your available encodings, but something like this should work:





iconv -f SHIFT_JIS -t UTF-8 jsfile %26gt; utf8file





If you need to do the conversion within the program, I've included some examples below.


No comments:

Post a Comment