How does GB18030 differ from Unicode


Best Solution

As per the Wikipedia article on GB18030, "GB18030 can be be considered a Unicode Transformation Format (i.e. an encoding of all Unicode code points) that maintains compatibility with a legacy character set." That is, all Unicode characters can be encoded in GB18030, but they will be encoded with different byte sequences than would be generated with UTF-8 or UTF-16. Handling the GB18030 encoding doesn't require any more special techniques than are required for any other non-Unicode encoding.

The ICU project is an open source library (for C or Java) that has full support for many different encodings, including GB18030. Information on converting between different encodings with ICU can be found here.