Sunday, July 21, 2013

Backup with Duplicity and Rackspace Cloud Files

Intro


Duplicity is a Linux tool for making backups of files an folders
  • supports full and incremental backups
  • supports encryption, by using GPG. ( You can have unnencrypted backups as well )
  • supports for many kinds of storage scp, rsync, Amazon S3 or Rackspace Cloud Files

My current setup is to make daily incremental backups and store them into Rackspace Cloud Files.
Once a fortnight you can do a full backup, and clean older backups (30 days old).
To use other storages you only need to change the last part of this guide.

During this article:

- Machine L is your local and personal machine. L == local
- Machine B is the machine with the data that we want to backup. B == backedup. Duplicity runs in this machine. I use Ubuntu 12.04 here.
- Machine S is the remote machine where we'll store the backup. S == storage

Step 1: Generate the encryption keys

We'll generate the keys in our local machine and export them to backup machine

( from http://www.debian-administration.org/articles/209 )

We'll need two gpg keys for our backups

- encryption key : the encryption key is used to protect the data in the backup files from snooping on the backup server
- signature key : the signature key is used to ensure the integrity of the backup files.

The private key for the signature key must be available to duplicity when it runs. Duplicity also requires the passphrase for the signing key be either entered manually or stored in an environment variable. (that means in the Machine B) If our encryption key and signature key are the same, then a compromise of the server means a compromise of the backed up data as well. We'll therefore use separate encryption and signature keys.

In your local machine, Machine L

sudo apt-get install gnupg

 generate encyption key

(in your local Machine L)

gpg --gen-key

  (and pick the default options: RSA & RSA + 4096 + never expires)
  passphrase: this is the encryption passphrase


with a result of

  gpg: key 5A87AAB8 marked as ultimately trusted
  public and secret key created and signed.

...


Do the same to generate your signature key, use a different paraphrase.
(in your local Machine L)

generate signature key

gpg --gen-key

  (and pick the default options: RSA & RSA + 4096 + never expires)
  passphrase: this is the sign passphrase

...

  gpg: key
927AE728 marked as ultimately trusted
  public and secret key created and signed.


To check that everything went well:

gpg --list-keys && gpg --list-secret-keys
/home/jesus/.gnupg/pubring.gpg
------------------------------
pub   4096R/5A87AAB8 2013-07-07
uid                  backuper-encrypt (Backup with duplicity)
sub   4096R/11122AE7 2013-07-07

pub   4096R/927AE728 2013-07-07
uid                  backuper-signature (Signature with duplicity)
sub   4096R/10E7002A 2013-07-07

/home/jesus/.gnupg/secring.gpg
------------------------------
sec   4096R/5A87AAB8 2013-07-07
uid                  backuper-encrypt (Backup with duplicity)
ssb   4096R/11122AE7 2013-07-07

sec   4096R/927AE728 2013-07-07
uid                  backuper-signature (Signature with duplicity)
ssb   4096R/10E7002A 2013-07-07

trust your keys before exporting

Now we are going to trust the keys before exporting them.

gpg --edit-key 927AE728
  > trust
  > 5
  > save

gpg --edit-key 5A87AAB8
  > trust
  > 5
  > save


and sign keys

gpg --sign-key 927AE728
gpg --sign-key 5A87AAB8 (not sure if this one is needed)

Once both keys have been created you need to export and copy the public encryption and private signature keys to the Machine B the safest way to do this is SCP/SSH ( (you'll need ssh access). You MUST keep safe and private the private encryption key and its paraphrase.

(in your local Machine L)
change the ip for the Machine B
cd /tmp
gpg --export -a 5A87AAB8 > backup.enc.pub.gpg
gpg --export-secret-keys -a 927AE728 > backup.sig.sec.gpg
gpg --export-ownertrust > backup.trust

scp backup.enc.pub.gpg backup.sig.sec.gpg backup.trust bob@192.168.33.10:/tmp
rm backup.*

Import keys in the backup server

Our backups are handled by root (full access to everything, and to keep signature passphrase private) so we need to configure duplicity logged as root in the Machine B. 
(in machine b)

sudo su
sudo apt-get install gnupg

cd /tmp
gpg --import /tmp/backup.sig.sec.gpg /tmp/backup.enc.pub.gpg
gpg --import-ownertrust /tmp/backup.trust

rm backup.*

Verify the keys were imported correctly. Check that the ID's are correct. The private encryption key was not transferred, so we expect only one entry for the secret keys.

gpg --list-keys && gpg --list-secret-keys

/root/.gnupg/pubring.gpg
------------------------
pub   4096R/927AE728 2013-07-07
uid                  backuper-signature (Signature with duplicity)


pub   4096R/5A87AAB8 2013-07-07
uid                  backuper-encrypt (Backup with duplicity)


/root/.gnupg/secring.gpg
------------------------
sec   4096R/927AE728 2013-07-07
uid                  backuper-signature (Signature with duplicity)



Note: If you didnt used the import ownertrust, trust the private key ( in case of untrusted key errors while running duplicity )

gpg --edit-key 927AE728
  > trust
  > 5
  > save


Step 2: configure duplicity to use Cloud Files

Install duplicity.  I use the latest version, which is not included by default in Ubuntu. I prefer to add a ppa for it and run the install via apt.

sudo apt-get -y install python-software-properties && sudo add-apt-repository -y  ppa:duplicity-team/ppa &&  sudo apt-get -y update && sudo apt-get -y upgrade

sudo apt-get -y install duplicity python-paramiko

Adding cloudfiles support

This step is only required if you are going to store backups on Rackspace Cloud Files. You'll find a lot more tutorials for using Amazon S3 .
To store backups in a remote server via scp or rsync it is even easier, and you did the hard part Jump to next step.

There are 2 ways of using cloudfiles, I use the new pyrax API. The old python-cloudfiles is now deprecated. Choose what works best for you.

option A ) using the new pyrax API

it is the official way, but as of July'13 it requires more manual tunning

sudo apt-get -y install python-pip python-dev build-essential
yes | sudo pip install pyrax && yes | sudo pip uninstall keyring
sudo apt-get -y install duplicity python-paramiko gnupg


In July'13 teh backend needed for pyrax is missing in duplicity. Then we need to copy the new backend ourselves  (at the present the backend for cfpyrax+http:// is missing). A backend is a 'module' that tells duplicity how to work with a storage like scp, rsync, s3, etc.
(remember we are root)

cd /tmp
wget https://bugs.launchpad.net/duplicity/+bug/1179322/+attachment/3735776/+files/pyraxbackend.py
sudo chown root:root pyraxbackend.py
sudo mv pyraxbackend.py /usr/share/pyshared/duplicity/backends/


sudo ln -s /usr/share/pyshared/duplicity/backends/pyraxbackend.py /usr/lib/python2.7/dist-packages/duplicity/backends/pyraxbackend.py


python -m compileall /usr/lib/python2.7/dist-packages/duplicity/backends


(in my machine B it was on python2.7/dist-packages, in yours, you can make  a `sudo find / -name backends` to find where to link to )

these steps enable duplicity to understand the scheme cfpyrax+http://
Note that it uses https even the scheme reads just http.

option B) using the deprecated python-cloudfiles api

sudo apt-get -y install python-stdeb
sudo pypi-install python-cloudfiles
sudo apt-get -y install duplicity python-paramiko


these installs enable duplicity to understand the scheme cf+http://
Note that it uses https even the scheme reads just http.

Step3: script for making the backups

In machine B, we set a cron task that runs daily. It runs as root and uses duplicity to make a backup and copy  it to Cloud Files (or the destination Machine S)

A base script for cloud files could be


CLOUD_CONTAINER="bob_backup" 
#required for CLOUD FILES SUPPORT 
export CLOUDFILES_USERNAME=my_username
export CLOUDFILES_APIKEY=4534534543543sd43434546456
export CLOUDFILES_REGION="ORD"
 
#required for duplicity 
export PASSPHRASE="passphrase for the sign key"
export SIGN_PASSPHRASE="passphrase for the sign key" 

options="--full-if-older-than 15D --volsize 250 --exclude-other-filesystems --sign-key 927AE728 --encrypt-key 5A87AAB8"
duplicity $options /var/log cfpyrax+http://${CLOUD_CONTAINER}
unset PASSPHRASE
unset SIGN_PASSPHRASE
unset CLOUDFILES_APIKEY

Note how duplicity is instructed to use the 2 keys and you pass the passhphrase of the signing key (this is safe since you need the private key of the encryption key AND its passphrase)

Duplicity generates 3 files (data, metadata and signature) each time it runs. These files will appear in your Cloud Files container.

As we are using the Cloud Files pyrax API, we use a cfpyrax+http:// uri. Change the usri scheme to cf+http://  for the old API.
If you back up to a server via scp or rsync change this remote uri accordingly.

For amazon, read this.

To avoid your backups grows too much,, add something like this at the end of the script

# Delete duplicity backups older than 30 days.
duplicity remove-older-than 30D --sign-key 927AE728 --encrypt-key 5A87AAB8 cfpyrax+http://${CLOUD_CONTAINER}

Verify the encryption.

To check that everything went well, we can tell duplicity to check the status of the backup.  We can do if from our machine B and the command to use is:
(emember to export all the cloudfiles variables first, as in the previous script)

duplicity collection-status --sign-key 927AE728 --encrypt-key 5A87AAB8 cfpyrax+http://${CLOUD_CONTAINER}

It will list all your backups and a comforting "No orphaned or incomplete backup sets found"

Testing the recovery

We need the private encryption key and its passphrase. Remember that we kept it in our private Machine L. If you lost them, you wont be able to recover your backup.

Move to your local Machine where private keys are available, and install duplicity (and the support for Cloud Files: step 2).

To do a restore, you need to run the duplicity command with the restore option. You will be prompted for a passhphrase. This time use the encryption passhphrase.

The command is
 duplicity [restore] [options] source_url target_dir
but 'restore' is optional. Duplicity knows that we are restoring since the remote url comes before a local directory. When the url is the last parameter, duplicity does a backup.

#/bin/bash
# note, to run this script you need to have imported the PRIVATE KEY used for encryption
# gpg --import /tmp/backup.sig.sec.gpg /tmp/backup.enc.pub.gpg 
# and you must know the passphrase for the encryption key

DST_FOLDER=/tmp/restored_files 
mkdir -p $DST_FOLDER

CLOUD_CONTAINER="bob_backup"
#required for CLOUD FILES SUPPORT 

export CLOUDFILES_USERNAME=my_username
export CLOUDFILES_APIKEY=4534534543543sd43434546456
export CLOUDFILES_REGION="ORD" 
# no passphrase provided, so we'll be asked interactively

options="--sign-key 927AE728 --encrypt-key 5A87AAB8 --volsize 250"
duplicity $options cfpyrax+http://${CLOUD_CONTAINER} $DST_FOLDER

unset CLOUDFILES_APIKEY

The verify command is another useful command (in this Machine L)

duplicity verify [options] source_url target_dir

Step 4) Finishing

Just remember to keep your encryption key & passphrase safe, and to check on a regular basis your backups.

Sources

http://27smiles.com/2010/04/07/securely-backup-of-vps-with-duplicity-and-gpg/
http://spin.atomicobject.com/2012/06/14/encrypted-offsite-backups-with-duplicity/
http://www.debian-administration.org/articles/209
for integration with cloud files

http://www.uno-code.com/?q=node/184
http://blog.chmouel.com/2011/01/06/backup-with-duplicity-on-rackspace-cloudfiles-including-uk-script/

5 comments:

  1. Those can be good cloud storage, I guess. The thing to like about internet is its allowance for ubiquity and multiplicity. Since interactions and services are data, they are therefore more transformable and permutational. Meaning, choices. It's up to you to sift through these choices and which permutation suits your case.

    WilliamsDataManagement.com

    ReplyDelete
  2. Thanks for this.

    As of duplicity 0.6.23 the pyrax backend has replaced the old cloudfiles. Your destination should look like

    cf+http://[my container]

    not

    cfpyrax+http://[my container]

    ReplyDelete
  3. Those are good Web Hosting providers but you can also try Go4hosting which deals in Web hosting solution including Cloud Solutions with 24*7 Support system.............

    ReplyDelete