R – Why does the .NET Framework StreamReader / Writer default to UTF8 encoding


I'm just looking at the constructors for StreamReader / Writer and I note it uses UTF8 as default. Anyone know why this is? I would have presumed it would have been a safer bet to default to Unicode.

Best Solution

UTF-8 will work with any ASCII document, and is typically more compact than UTF-16 - but it still covers the whole of Unicode. I'd say that UTF-8 is far more common than UTF-16. It's also the default for XML (when there's no BOM and no explicit encoding specified).

Why do you think it would be better to default to UTF-16? (That's what Encoding.Unicode is.)

EDIT: I suspect you're confused about exactly what UTF-8 can handle. This page describes it pretty clearly, including how any particular Unicode character is encoded. It's a variable-width encoding, but it covers the whole of Unicode.