Reduce The Size Of Your Tarsnap Backups With This One Weird Trick (the trick is “pay attention to the size of your backups”)

, .

Tarsnap is an encrypted backup product. The client software is gratis and source-available (but not FLOSS); you pay for storage (US$ 0.25 / GB-month) and network traffic (US$ 0.25 / GB) on and with their service. You get an email notification when your current account balance dips below 7 days’ worth of storage costs at your current size, at which point you should increase your balance and/or delete some old backups. (There’s no built-in auto-deletion of backups; some third-party software exists but I just do it manually.) My backups include /etc and /home, with an exclude list of various files and directories that I don’t need backed up which I put together when I first set up these backups.

When I most recently got this notification, I was disturbed to see that the daily storage costs were higher than usual – not exorbitant but bad enough to warrant inspection. I could see in the account history on which day the storage cost had increased, but I didn’t remember what happened on that day which might have triggered it. I set out to extract the most recent backup so I could inspect it with ncdu – this seemed like the most practical way to find out what was taking up the most space inside. The extraction process took much longer than I expected (as of this writing, it’s been running for 10 days and hasn’t actually finished yet), but partway through it became clear that I had inadvertently included a large SQL dump in the backup, and failed to update the exclude list when renaming some previously-excluded ISO files.

Because waiting for days to extract the latest backup doesn’t sound like a great experience to go through regularly (and remember, you’re paying for that network traffic to download it again!), I started thinking about alternative solutions to find out what’s getting backed up. I found out that ncdu (“ncurses disk usage”) has an option to exclude files from the disk usage report, and its pattern syntax is (as far as I can tell) compatible with the one for Tarsnap. So I put together a little shell script to run ncdu with the exclude list from my Tarsnap config; you can find the latest version on GitHub (assuming I don’t rename the file), or the current version as of this writing below:

ncdu-tarsnap
# Show disk usage of tarsnap backups.
#
# You can delete files in ncdu,
# but keep in mind that ncdu is operating on the local file system.
# Don’t delete any files that you want to keep there,
# and don’t assume that they will be removed from any existing backups.
#
# Assumptions that apply to my setup but may or may not apply to others’:
# * the most relevant folder being backed up is ~
#   (I actually back up /home and /etc but everything outside of ~ is negligible –
#   ncdu doesn’t support inspecting multiple directories at once)
# * some files below ~ are not $USER-readable, so running ncdu with sudo is useful
# * the backup is being made as root
#   (otherwise the non-$USER-readable files should not be counted after all)
# * only /etc/tarsnap/tarsnap.conf is used
#   (no ~/.tarsnaprc and also no --exclude on the command line)
function ncdu-tarsnap {
    # bash -c is needed because sudo … <() doesn’t work properly (see -C in sudo(8))

    # --apparent-size probably makes more sense for a backup than --disk-usage

    # note that this will also show (with empty size)
    # files that are excluded;
    # unfortunately, ncdu’s --hide-hidden hides both hidden (.*) and excluded files,
    # and there seems to be no option for hiding excluded but showing hidden files :(
    sudo bash -c "ncdu \"$HOME\" --apparent-size --exclude-from <(sed -n '/^exclude\s\+/ s///p' /etc/tarsnap/tarsnap.conf)"
}

Looking through the data-that-would-be-backed-up in ncdu-tarsnap, I was able to identify several patterns that I should add to my exclude list, and also some data that I could also just delete from my live file system. So that’s the first part of my One Weird Trick (so sorry about that title): adjust your exclude patterns based on your current file system contents.

That’s all nice and well, but what happens the next time I leave a large SQL dump in my home directory without thinking about the backups? I realized I needed to set up a process to periodically check the total size of the data-that-would-be-backed-up and alert me if it got too large. ncdu isn’t useful for that, but fortunately du (“disk usage”) also has exclude-pattern support, again with compatible syntax, and is easy to use in a shell script. Sending a desktop notification from a systemd system service (I need it to run with privileges because not all the files in my home directory are readable by me) isn’t pretty, but the following works:

tarsnap-size-check.service
# /etc/systemd/system/tarsnap-size-check.service
[Unit]
Description=Warn Lucas if it appears that the next tarsnap backup would be bigger than 50 GB

[Service]
Type=oneshot
ExecStart=bash -s
StandardInputText=\
  bytes=$(du --summarize --bytes /home /etc --exclude-from <(sed -n '/^exclude\\s\\+/ s///p' /etc/tarsnap/tarsnap.conf) | awk '{ total += $1 } END { print total }'); \
  if (( bytes <= 50000000000 )); then exit 0; fi; \
  gigabytes=$(bc <<< "scale=1; $bytes/1000000000"); \
  DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus notify-send -a 'Tarsnap size warning' -i dialog-warning 'Tarsnap backup too large' "The projected size of a new backup is $gigabytes GB ($bytes bytes), should be below 50 GB! Use ncdu-tarsnap to inspect the situation."; \
  exit 1
User=lucas
AmbientCapabilities=CAP_DAC_READ_SEARCH

CapabilityBoundingSet=CAP_DAC_READ_SEARCH
IPAddressDeny=any
LockPersonality=yes
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
PrivateDevices=yes
PrivateMounts=yes
PrivateNetwork=yes
PrivateTmp=yes
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=read-only
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
ProtectSystem=strict
RestrictAddressFamilies=AF_UNIX
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources
tarsnap-size-check.timer
# /etc/systemd/system/tarsnap-size-check.timer
[Unit]
Description=Regularly monitor the size of tarsnap backups according to the current configuration

[Timer]
OnCalendar=hourly
AccuracySec=1h

[Install]
WantedBy=timers.target

(Of course, you may want to adjust the hard-coded threshold to something other than 50 GB. And maybe you prefer another notification setup as well.) The timer is set up to run hourly, whereas my backups run weekly, so that should give me enough time to act whenever it notifies me. (I included a pointer to ncdu-tarsnap in the notification text in case I forget what I called the script.) And that’s the second part of my One Weird Trick: Set up alerts if your projected backup size exceeds a preconfigured limit.

Either of these could be built into Tarsnap, but as far as I can tell, they aren’t. But at least it’s not too difficult to build them around Tarsnap instead. (The above scripts and units might even be applicable to other backup solutions, as long as those solutions also use glob/fnmatch-like patterns.)