More specific dupe of 875228—Simple data storing in Python.
I have a rather large dict (6 GB) and I need to do some processing on it. I'm trying out several document clustering methods, so I need to have the whole thing in memory at once. I have other functions to run on this data, but the contents will not change.
Currently, every time I think of new functions I have to write them, and then re-generate the dict. I'm looking for a way to write this dict to a file, so that I can load it into memory instead of recalculating all it's values.
to oversimplify things it looks something like:
{((('word','list'),(1,2),(1,3)),(…)):0.0, ….}
I feel that python must have a better way than me looping around through some string looking for : and ( trying to parse it into a dictionary.
Best Solution
Why not use python pickle? Python has a great serializing module called pickle it is very easy to use.
There are two disadvantages with pickle:
If you are using python 2.6 there is a builtin module called json. It is as easy as pickle to use:
Json format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. But might be slower than cPickle.