What is the difference between utf8mb4
and utf8
charsets in MySQL?
I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings;
but I'm curious to know whats the difference of utf8mb4
group of encodings with other encoding types defined in MySQL Server.
Are there any special benefits/proposes of using utf8mb4
rather than utf8
?
Best Solution
UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.
So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.
This is what (a previous version of the same page at) the MySQL documentation has to say about it:
So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.