Matlab – Is it possible to intercept a matlab save() bytestream

matlabsaveserialization

In matlab it is possible to write matlab objects, or even the entire workspace, to a file using the matlab save() call. I would like to intercept the bytestream and postprocess it before it goes to a file, is this possible? Alternatively, is it possible to specify the filedescriptor that the bytestream is written to instead of the filename that usually goes into the save() call as an argument.

Note that I'm not looking for an alternative way to write a file in matlab, I know I can fopen() a file and write whatever I want, but the point is that I want to (re)use the object serialization that is internal to the save call, not invent my own again.

An analog question would of course arise for the load() call, but in that case intercepting the bytestream before it goes into the deserialization process, but I guess if it is possible for save() the solution to the load() problem will follow naturally.

A few clarifications:

  1. I'm not looking at a new way to serialize matlab data, it already exists and the whole point of the exercise is to use the existing serialization in the save() call so that 1) I don't need to start updating the serialization code for new types of objects in newer versions of matlab, or heaven forbid people start using custom OOP objects, and 2) I can still easily use existing code to read in mat files, such as for example scipy's support for mat files.

  2. The stream must not get out to a file or anything before post-processing, the idea is encryption for security, writing the stream out plain to a file completely undermines that purpose.

Complications:

  • It seems that the functionality used in the save function in matlab isn't just a regular sequential write. Examining the object code of the libraries it seems that the save function is implemented using matPutVariable (previously called matPutArray) which writes a given variable of type mxArray* out to a file of type MATFile* opened with matOpen. The problem here is the following text in the description of matPutVariable:

    If the mxArray does not exist in the MAT-file, the function appends it to the end. If an mxArray with the same name exists in the file, the function replaces the existing mxArray with the new mxArray by rewriting the file.

    This means that the matPutVariable function will have to seek through the file, obviously seeking will not be possible when pipes are used, so using pipes to implement our processing of the bytestream is not possible when using this existing serialization functionality.

Best Solution

How about using a virtual filesystem? On Windows there is a commercial library called BoxedAPP SDK that allows you to create a virtual file that is only visible to the creating process (possibly children also). You would probably have to make a MEX to interface the library. First you would create the virtual file and then you could use the save command in matlab with the same filename. Then you can read the serialized .mat bytestream using normal fopen/fread functions in the matlab and do what ever you wish with it. This would at least prevent the file getting created on the hard disk. I'm not sure though if the file or parts of it could get to the swap file in some situation as the file is actually created to the memory.

There seems to be also undocumented functions mxSerialize and mxDeserialize in libmx that you could use eg. by loadlibrary/calllib functions directly from matlab or by a wrapper mex. A bit of Googling revealed that the signature for these functions should be

mxArray* mxSerialize(const mxArray*);
mxArray* mxDeserialize(const void*, size_t);

and some tests revealed that mxSerialize() gets the matlab variable as argument and returns a serialized bytes as uint8 array. The mxDeserialize() transforms this uint8 array (1st argument) back to matlab object as return value. The 2nd argument for mxDeserialize seems to be the number of elements in the 1st argument. Using these undocumented functions is not though guaranteed to work in future because TMW might change the API.

Related Question