Python – setting the gzip timestamp from Python


I'm interested in compressing data using Python's gzip module. It happens that I want the compressed output to be deterministic, because that's often a really convenient property for things to have in general — if some non-gzip-aware process is going to be looking for changes in the output, say, or if the output is going to be cryptographically signed.

Unfortunately, the output is different every time. As far as I can tell, the only reason for this is the timestamp field in the gzip header, which the Python module always populates with the current time. I don't think you're actually allowed to have a gzip stream without a timestamp in it, which is too bad.

In any case, there doesn't seem to be a way for the caller of Python's gzip module to supply the correct modification time of the underlying data. (The actual gzip program seems to use the timestamp of the input file when possible.) I imagine this is because basically the only thing that ever cares about the timestamp is the gunzip command when writing to a file — and, now, me, because I want deterministic output. Is that so much to ask?

Has anyone else encountered this problem?

What's the least terrible way to gzip some data with an arbitrary timestamp from Python?

Best Solution

From Python 2.7 onwards you can specify the time to be used in the gzip header. N.B. filename is also included in the header and can also be specified manually.

import gzip

content = b"Some content"
f = open("/tmp/f.gz", "wb")
gz = gzip.GzipFile(fileobj=f,mode="wb",filename="",mtime=0)