Endianness theory and concept


This isn't a question specific to any programming language. Say you have some file written on a big-endian machine, and you know this. If two single-byte values were written back-to-back, how would you know? Big-endian reverses the order of 16, 32, and 64 bit values, so how would you know you need to read it as individual bytes?

For instance, you write the byte 0x11, then the byte 0x22. The file then contains 0x1122. If you read that on a little endian machine, you'd have to convert it. So would you read it as 2211, or 1122? Would you know how?

Does this make any sense? I feel like I'm missing something super basic here.

Best Solution

There is no way to know. This is why formally specified file formats typically mandate an endianness, or they provide an option (as with unicode, as MSN mentioned). This way, if you are reading a file with a particular format, you know it's big-endian already, because the fact that it's in that format implies a particular endianness.

Another good example of this is network byte order -- network protocols are typically big-endian, so if you're a little-endian processor talking to the internet, you have to write things backwards. If you're big-endian, you don't need to worry about it. People use functions like htonl and ntohl to preprocess things they write to the network so that their source code is the same on all machines. These functions are defined to do nothing on big-endian machines, but they flip bytes on little-endian machines.

The key realization is that endianness is a property of how particular architectures represent words. It's not a mandate that they have to write files a certain way; it just tells you that the instructions on the architecture expect multi-byte words to have their bytes ordered a certain way. A big-endian machine can write the same byte sequence as a little-endian machine, it just might use a few more instructions to do it, because it has to reorder the bytes. The same is true for little-endian machines writing big-endian formats.

Related Question