C++ fread jibberish

arraysc++file-iomemory-management

For some reason my buffer is getting filled with jibberish, and I'm not sure why. I even checked my file with a hex editor to verify that my characters are saved in a 2 byte unicode format. I'm not sure what's wrong.

[on file open]

fseek(_file_pointer, 0, SEEK_END);
this->_length = ftell(this->_file_pointer) / sizeof(chr);

[Main]

//there is a reason for this, I just 
//didn't include the code that tells why
typedef wchar_t chr;
chr *buffer = (chr*)malloc(f->_length*sizeof(chr));
if(buffer == NULL)return;
memset(buffer,0,f->_length*sizeof(chr));
f->Read_Whole_File(buffer);
f->Close();
free(buffer);

[Read_Whole_File]

void Read_Whole_File(chr *buffer)
{
    if(buffer == NULL)
    {
        this->_IsError = true;
        return;
    }
    fseek(this->_file_pointer, 0, SEEK_SET);
    int a = sizeof(buffer[0]);//for debugging purposes  
    fread(buffer, a, _length, this->_file_pointer); 
}

Best Solution

Assuming your error handling (that you said you've omitted here) is sound, I see two reasons that may be the cause of the problem:

  1. First of all, wchar_t may not necessarily be 2 bytes, its size is implementation defined. For example on Linux it's most likely 4 bytes.

  2. It may be that the file is UTF-16BE (big-endian), and you are running on a little-endian platform, so the wchar_t values in your buffer have their byte order swapped.

Or, it may be both. Please update your question with some details about your platform and a few bytes from the sample file in hex (if possible).

In any case, you should not make any assumptions about sizes of standard C or C++ types when dealing with Unicode files.

For example, If you want to read UTF16-BE, use C99 uint16_t type (or an equivalent type that's guaranteed to be 16-bit), and swap byte order of your input depending on your platform endian-ness and file endian-ness. You can detect file endian-ness using a byte order mark if it's present in the file.

Alternatively, use a third-part Unicode library, like ICU. It takes care of all platform-specific details and will save you a lot of time debugging in a sizable project.

Related Question