Rsync Backup System [7]: Current System

So it has proved desirable to return to making backups from a remote server connected over a network. This is currently a separate computer in our setup of computers, but we plan to integrate it into one of the existing computers (the one that is used mainly to play media) since we have determined the backup task does not use many resources and because this computer can be easily restarted to detect the backup disk when inserted.

Since last time in this series when the use of the rsync daemon was preferred in the backup client as compared with SSH which had proved unreliable, the rsync daemon was set up and tested on a computer and it worked well. It is therefore being put in to all the systems that get backed up regularly. Once it is installed as a service then it stays running in the background all the time. Each client has to have the backupuser user account in place and ACLs set to allow this user read only access to all files and folders. The server also has this account set up and is logged in with it during the backup, although strictly speaking these steps are unnecessary as there is no need to have a specific backup user on the backup server since it isn’t authenticating that against the client. When we use the rsync daemon as the remote client, it handles the access automatically with the specified module and we set a read only parameter to limit the access that it has to the source path, which is essentially what we previously have set up the backupuser account for on clients.

Here is our rsyncd.conf file again:

uid = 1001
gid = 1001
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
log file = /var/log/rsync.log

[backupuser]
path = /home/folder
comment = some string
read only = true
timeout = 300

This is a fairly straightforward file – there are many other parameters that can be set but this one handles the basics. At the top the uid and gid specify the user account on the computer running the daemon that will perform the backup, which in this case 1001 represents the backupuser account. The next three lines are all specifying some core files that the daemon uses. We don’t know much about them except the log file parameter, which ensures a log file is kept on the client that is running the daemon. This file by default contains mainly only error messages, which in our case has proved very useful because in our first trial rsync daemon backup of a system, several thousand files were not backed up and we could quickly check what was happening by looking in this log without having to churn through the much fuller log generated on the backup server. It turned out these were nearly all web browser cache files, in which case we assume Mozilla has designed Firefox to deny the usual read permissions to the files/folders. There were about 8 files that should have been backed up and we need to check our permissions settings because the ACLs should have allowed backupuser access to these.

The last section in our rsyncd.conf file is the module. In the design of the rsync daemon, there can be one or several modules specified. When a module name is passed in the rsync command (the source parameter which as shown below is the part that reads rsync://a.b.c.d/module), rsyncd looks up the section in its conf file and reads out the parameters specified. There are a lot of possible parameters many of which experienced rsync users will recognise from the command line parameters for rsync itself. In our case these are fairly simple, the key ones are the source path and the read only parameter set to true to ensure there is no possibility of any source file being changed/deleted. Once the conf file has been put in place and rsync is installed, the next steps (on Debian) are to start the rsync service with systemctl start rsync and then enable it for automatic start at system init with systemctl enable rsync.

The rsync command is a little different in its form, reflecting the changed connection type, and will typically look something like this:

rsync -arXvz –progress –delete rsync://a.b.c.d/module /mnt/backup/dir –log-file=/home/backupuser/rsyncYYYYMMDD.log –log-file-format “|compname|%f|%M|%l|%b|%o|%U”

There are various components of that, which include:

  • The rsync:// connection specification, which includes a.b.c.d (the IP address) and the module name specified in the rsyncd.conf file set up with the rsync daemon on the backup client.
  • The backup path shown here as /mnt/backup/dir
  • log-file value which is the full path to the log file stored.
  • log-file-format. This can contain anything you want to, the pipe characters used as separators are by personal choice. The first part in our preferred format is the name of the backup client. The % characters precede specific variables whose values can be extracted at the time a log entry is generated and the actual values will appear in the entry. Normally a log entry is created in the desired format for each file that is successfully copied. This entry is preceded by a built in timestamp. If the entry shows a file that could not be copied, the user specified log entry format is ignored and the error message is displayed in its place.

Command line parameters to the rsync command are:

  • –progress – this ensures a display in the terminal where the command is running that shows progress. Typically it shows the name of each file as it is copied and realtime percentage copied and other information such as the file number transferring, total number of files and number remaining.
  • –delete – this command is used when an incremental backup is taking place. If files already exist on the backup target, rsync overwrites them according to some set of criteria such as the date on the backup volume is older than the source. If files have not changed they are not overwritten. This makes it possible to have fast incremental backups onto a backup target that contains a previous backup of the same source. The –delete switch is used to ensure the backup target is an exact clone (mirror) of the source and it does this by removing files in the target that are not present in the source (they have been deleted from source since the last backup).
  • –log-file is the path to the log file name. Unusually it requires an = between the option and its subparameter.
  • –log-file-format specifies the information to be included in each log file entry. As explained above, this format is used only where a file is successfully transferred. There are a range of variable specs that can be passed in this command. These are documented in the man page for rsyncd.conf. Note in the example shown, the third variable is a lowercase L, not a capital i.

So far this has been a lot more reliable than using SSH. Local backups of computers is undesirable because the disks don’t hotplug (they tend to die if you insert them while the power is on, even with supposed hotpluggable SATA sockets) and hibernating and resuming is unreliable in getting them to be detected. Generally a restart is needed to ensure detection, and this means we lose the benefits of hibernating the system that is doing the backups. Therefore the least used computer can double as a backup server since we don’t actually do any work on that one, it is mainly used for video playback (having a computer just for that task is very useful when some of the CPU intensive tasks we do on our other computers can result in video playback stalling and stuttering). Well won’t this backup potentially cause the same issue? No – running rsync in a shell window uses only a small amount of resources as would be expected for a command line application that doesn’t need GUI overheads.