Python – How to create a large matrix of matrices in python

hdf5matrixnumpypytablespython

I am working with a large matrix of size m * n for m,n>100000. Since my data is huge I want to store the matrix in memory and work with HDF5, and PyTables.

However, the elements of my matrix are small matrices of real values of dimension 5*5.

I have already looked at the following post, but I would like to know if there is any other way of storing this type of data in tables?

(Create a larger matrix from smaller matrices in numpy)

Thank you in advance

Best Answer

In numpy there are two relevant structures.

One is a 4dimensional array, e.g. np.zeros((100,100,5,5),int). The other is an 2 dimensional array of objects. np.zeros((100,100),dtype=object). With object array, the elements can be anythings - strings, numbers, lists, your 5x5 arrays, other 7x3 array, None, etc).

It is easiest to do math on the 4d array, for example taking the mean across all the 5x5 subarrays, or finding the [:,:,0,0] corner of all.

If your subarrays are all 5x5, it can be tricky to create and fill that object array. np.array(...) tries to create that 4dim array if possible.

With h5py you can chunk the file, and access portions of the larger array. But you still have to have a workable numpy representation to do anything with them.