7 min read

How to Securely Back Up Data to Cloud Storage

Jameson researches user-friendly solutions for backing up sensitive data to cloud providers while keeping it safe from third party snooping.
How to Securely Back Up Data to Cloud Storage

Today we learned that Apple dropped plans to let iPhone users fully encrypt backups of their devices in the company’s iCloud service after the FBI complained that the move would harm investigations. In the information age our data has become more and more valuable, yet it seems harder and harder to protect. How is the average user supposed to protect their data from attackers and from any forms of loss due to natural disaster?

Sure, most cloud providers encrypt your files while they're transferred over the internet and after they're stored, but this only protects them from external attackers. The provider itself can, of course, decrypt and view all of your files. This could happen due to a malicious employee or perhaps the company could be coerced into sharing your data by agents of a nation state.

For many years I've diligently and painstakingly made physical backups of my data, never sending that data across the internet due to fears of what might happen should it end up in the wrong hands. But today we have the technology to send our data to third parties who specialize in highly redundant storage without having to trust them not to snoop. The cryptographic tools to do so exist, we just have to use them correctly for our own benefit.

The Contenders

BorgBackup - Free open source command line software. All data can be protected using 256-bit AES encryption. Only supports remote backups to servers for which you have SSH access.

Boxcryptor - Supports 30+ cloud storage providers. Appears to be closed source.

Cipherdocs - Free open source software that supports syncing GPG-encrypted files to Dropbox and Google Drive. Appears to be Windows-only, hasn't been updated in years.

Cryptomator - Free open source software with multi-OS GUI support. Supports Dropbox & Google Drive, encrypts each file individually using AES.

Duplicati - A free, open source, backup client that securely stores encrypted, incremental, compressed backups on cloud storage services and remote file servers. Supports over a dozen cloud storage providers. It encrypts files in blocks so that the storage provider doesn't even know how many files you have backed up.

Duplicity - Free open source software that supports 20+ cloud providers. A very mature project, but command line only.

EncFSMP - Free open source software that enables OS X and Windows computers to use EncFS encrypted volumes.

Rclone - Free open source command line program to sync files and directories to and from over 40 cloud storage providers.

Restic - Free open source command line software that supports Amazon S3, Backblaze, Microsoft Azure, Google Cloud, any service rclone supports. Encrypts files via AES-256.

Spideroak One Backup - Syncs files across all your devices. Desktop application is available for Linux, macOS, and Windows operating systems; mobile app available for Android and iOS. Looks like the desktop client is closed source.

Tarsnap - free open source command line utility; syncs your AES encrypted files to a server that deducts funds from your prepaid account based upon your bandwidth and storage use.

Veracrypt

A completely different method that I came across but haven't experimented with: mount a Dropbox or other cloud storage volume as a virtual hard drive on your computer and then use Veracrypt (not Truecrypt) to create an encrypted volume inside of the mount. These instructions should still be generally applicable.

Pros: probably the most user-friendly once you've got it set up; no command line usage required to keep things in sync.

Cons: Uses symmetric encryption, so it requires entering the password even just to write data. Also, you can't resize the encrypted partition so you'll want to create it as large as possible, which will result in a long initial sync time upon creation.

Which Tool is Best?

To be fair, any of these tools ought to get the job done. Boxcryptor, Cryptomator, and Spideroak are probably the most user-friendly, though Boxcryptor and Spideroak are closed source.

If you want the most user-friendly and trustworthy solution for non-technical folks, I highly recommend Cryptomator and pairing it with a Dropbox account. It took me ~10 minutes to set up an encrypted Dropbox backup using Cryptomator's guide.

But what if you desire an even higher threshold of security; the whole point of this exercise is to implement a solution that doesn't require trusting a third party.

The optimal tool should support multiple cloud backup services for maximum robustness.

Also, I'd prefer a tool that uses GPG rather than a symmetric encryption scheme. Symmetric encryption is vulnerable to being cracked if your password is weak or compromised. By using GPG I can take advantage of the high security setup I already have for my GPG private keys, which only exist on yubikeys.

From looking through the above list, the multi-platform open source tools that support syncing GPG encrypted files are Duplicati and Duplicity.

Duplicati

I decided to give Duplicati a shot because it looks fairly well developed and has a nice browser-based user interface. It supports syncing backups to many platforms, even including Sia's decentralized cloud!

Configuring Duplicati to use Dropbox was quite easy due to the Oauth integration; it just took a few clicks.

I was confused by the backup wizard because I chose "GPG Encryption (external)" rather than "built-in AES encryption" but it didn't ask me to select a public key. It turns out the default settings in Duplicati when using GPG are symmetric encryption which is not what we want! In order to properly set the public GPG key to which all the files are encrypted, add the following advanced option to your backup to configure it to use the key your GPG tool has associated with your email address. If you have multiple keys, I'm not sure what will happen.

--gpg-encryption-switches="--recipient youremail@domain.com"

Also note that there's no need to set a path for the backup since it will create a "Duplicati Backup" folder automatically. You'll get a scary warning when finishing the wizard but you can ignore it. I told Duplicati to begin the initial backup sync and was pleased to see my encrypted blocks of files appearing.

After leaving the sync running overnight I saw that it had successfully uploaded my tens of thousands of files. But of course I can't just trust that it worked - I need to verify! So I whipped out a fresh external hard drive, plugged it it, and told Duplicati to restore my files to the new drive. After giving it a few more hours to download the data, I then needed to compare the data between the two drives. I did get an error at one point, though it didn't stop the restoration process:

Oops, looks like file names with asterisks don't play nice.

In order to perform a byte-for-byte comparison on Linux I decided to install a command line tool called "md5deep."

md5deep -r /path/to/original/backup/ > file_hashes.txt
md5deep -x file_hashes.txt -r /path/to/restored/backup

It can take a while to run each of those commands depending upon how much data you backed up. In my case the second command output 3 file names which I found to be concerning, but upon closer inspection the problem was that I had several files with the same names but different letter capitalization. So be aware that this md5deep tool appears to be case insensitive.

However, it appears that this tool is only good for comparing the contents of mirrored files - it doesn't actually tell you if there are files that completely failed to be copied. I knew there were differences because my original backup drive was reporting storing ~30 more files than my restored backup drive. I was able to figure out why by running:

diff --brief --recursive /path/to/original /path/to/restored

Based upon the files I found missing from the backup, it seems to be skipping files and folders containing the characters: ~, *, and :. After a little research it appears that this was a file system problem rather than a Duplicati problem; my source drive was an ext4 (Linux) file system while I formatted my backup as FAT for maximum read compatibility by various operating systems. However, FAT file systems have reserved characters that are considered illegal for use in file and folder names. So be aware of what file systems you are using; if you aren't using the same file systems then you can encounter incompatibilities! My only concern with Duplicati is that it only showed one error about the file with the asterisk in the name and didn't mention any of the folders that were skipped. This is why verification is important!

All in all I'm pretty happy with Duplicati and look forward to configuring it to back up my data to even more target locations.

Not Your Encryption Keys, Not Your Encrypted Data

It's amazing to see the progress that hardware and software have made over the past 20 years; the thought of encrypting hundreds of gigabytes of data on a consumer CPU and sending them across the internet in a matter of hours would have been unthinkable when I first connected to the 'net. We're even at the point where many of these secure backup solutions have a point-and-click level of usability that make them accessible to less technical users.

It seems that, as usual, the greatest challenge will be education - we must teach our friends and family to be mindful of their data. The first step to taking ownership of your digital life is taking the time to think through what information you are exposing and to whom. We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy. It is to their advantage to pry into our lives; we should expect that they will do so. We must defend our own privacy if we expect to have any!