R – How to you concatenate two huge files with very little spare disk space?


Suppose that you have two huge files (several GB) that you want to concatenate together, but that you have very little spare disk space (let's say a couple hundred MB). That is, given file1 and file2, you want to end up with a single file which is the result of concatenating file1 and file2 together byte-for-byte, and delete the original files.

You can't do the obvious cat file2 >> file1; rm file2, since in between the two operations, you'd run out of disk space.

Solutions on any and all platforms with free or non-free tools are welcome; this is a hypothetical problem I thought up while I was downloading a Linux ISO the other day, and the download got interrupted partway through due to a wireless hiccup.

Best Solution

I think the difficulty is determining how the space can be recovered from the original files.

I think the following might work:

  1. Allocate a sparse file of the combined size.
  2. Copy 100Mb from the end of the second file to the end of the new file.
  3. Truncate 100Mb of the end of the second file
  4. Loop 2&3 till you finish the second file (With 2. modified to the correct place in the destination file).
  5. Do 2&3&4 but with the first file.

This all relies on sparse file support, and file truncation freeing space immediately.

If you actually wanted to do this then you should investigate the dd command. which can do the copying step

Someone in another answer gave a neat solution that doesn't require sparse files, but does copy file2 twice:

  1. Copy 100Mb chunks from the end of file 2 to a new file 3, ending up in reverse order. Truncating file 2 as you go.
  2. Copy 100Mb chunks from the end of file 3 into file 1, ending up with the chunks in their original order, at the end of file 1. Truncating file 3 as you go.