How does rsync behave for concurrent file access


I'm using rsync to run backups of my machine twice a day and the ten to fifteen minutes when it searches my files for modifications, slowing down everything considerably, start getting on my nerves.

Now I'd like to use the inotify interface of my kernel (I'm running Linux) to write a small background app that collects notifications about modified files and adds their pathnames to a list which is then processed regularly by a call to rsync.

Now, because this process by definition always works on files I've just been – and might still be – working on, I'm wondering whether I'll get loads of corrupted / partially updated files in my backup as rsync copies the files while I'm writing to them.

I couldn't find anyhing in the manpage and was yet unsuccessful in googling for the answer. I could go read the source, but that might take quite a while. Anybody know how concurrent file access is handled inside rsync?

Best Solution

It's not handled at all: rsync opens the file, reads as much as it can and copies that over.

So it depends how your applications handle this: Do they rewrite the file (not creating a new one) or do they create a temp file and rename that when all data has been written (as they should).

In the first case, there is little you can do: If two processes access the same data without any kind of synchronization, the result will be a mess. What you could do is defer the rsync for N minutes, assuming that the writing process will eventually finish before that. Reschedule the file if it is changes again within this time limit.

In the second case, you must tell rsync to ignore temp files (*.tmp, *~, etc).