I am a bit confused about encodings. As far as I know old ASCII characters took one byte per character. How many bytes does a Unicode character require?
I assume that one Unicode character can contain every possible character from any language – am I correct? So how many bytes does it need per character?
And what do UTF-7, UTF-6, UTF-16 etc. mean? Are they different versions of Unicode?
I read the Wikipedia article about Unicode but it is quite difficult for me. I am looking forward to seeing a simple answer.
Best Solution
Strangely enough, nobody pointed out how to calculate how many bytes is taking one Unicode char. Here is the rule for UTF-8 encoded strings:
So the quick answer is: it takes 1 to 4 bytes, depending on the first one which will indicate how many bytes it'll take up.