How does GB18030 differ from Unicode


How does the Chinese GB18030 code set differ from Unicode?

What special techniques are required for handling GB18030?

Are there any (open source) libraries for handling GB18030?

Best Solution

As per the Wikipedia article on GB18030, "GB18030 can be be considered a Unicode Transformation Format (i.e. an encoding of all Unicode code points) that maintains compatibility with a legacy character set." That is, all Unicode characters can be encoded in GB18030, but they will be encoded with different byte sequences than would be generated with UTF-8 or UTF-16. Handling the GB18030 encoding doesn't require any more special techniques than are required for any other non-Unicode encoding.

The ICU project is an open source library (for C or Java) that has full support for many different encodings, including GB18030. Information on converting between different encodings with ICU can be found here.