Backup Encryption

Originally published by SysAdmin Magazine, March 2007

Contents

      Introduction
      Backup scripts with encryption
      Encryption with Amanda
      Which algorithm?
      Performance
      Key Management
      Conclusion
      References

Introduction

Hardly a week goes by where you don't hear the stories in the news about companies leaking important data through losing their backup tapes. Whether it is through malicious theft, opportunistic snatching, or accidental misplacement there is a huge cost to business when data is lost. When the data contains sensitive information about members of the public, possibly including bank account and credit card numbers, the cost can be severe indeed. Simply the stigma of having to notify clients that you've lost their data is extremely damaging.

For example, in June 2005 Citigroup was forced to issue a media statement admitting that tapes containing personal information on 3.9 million customers had been lost while they were being carried to another site.

In July 2006 Chase staff accidentally discarded five computer tapes containing personal information and financial records of 2.6 million Circuit City credit card account holders. Even though the tapes were in a locked box, the company was forced to release a public statement and was forced to notify and compensate all of those customers.

And this threat isn't new at all. In 1977 a disgruntled system administrator at Imperial Chemical Industries (ICI) in the UK stole a set of backup tapes and attempted to extort a large amount of money from the company for their return. He was eventually captured.

Despite the obvious and well-known risks, I've found that few companies actually use encryption to protect their backups. And this is quite surprising considering how easy it is to do, and just how costly it is to lose data. In this article I'll show you how to add encryption to your backups - and some of the pitfalls to watch out for.

Backup scripts with encryption

There are a couple of good open source tools which will do encryption nicely. Probably the best one is OpenSSL, which supports a wide range of ciphers and is very easy to add into existing backup scripts. Quite simply, the command:
    openssl enc -aes-256-cbc -salt -pass pass:123456
will encrypt standard input with the key "123456" and write to standard output. Note that the output stream will be 8-bit binary unless you specify the -a option for base64 output encoding. The -salt option should always be used to use a salt in the encrypting key derivation functions. For the key, or password, you really shouldn't specify it on the command line like in the example above because any user will be able to see it if they run the 'ps' command. It is better for automated scripts to put the key into a file which is set to be read- only by root and use the file:filename construct as in the following backup example:
    tar cvf - / | \
       openssl enc -aes-256-cbc -salt -pass file:/root/.backup_key >/dev/rmt0
To decrypt with OpenSSL, use the -d option:
    openssl enc -d -aes-256-cbc -pass file:/root/.backup_key </dev/rmt0 | \
  	  tar xvf -
GPG is also a good tool for encryption, and a number of encrypting backup systems prefer to use GPG.

GPG is well-known as a public-key, or asymmetric, encryption tool. This allows the data to be encrypted with one key and require a separate key for the decryption. Unless you have very specific requirements for your backups you probably won't need this functionality; for most backups symmetric encryption is best.

To encrypt with GPG in symmetric mode, use commands similar to the following example:

    GPG_KEY=/root/.backup_key
    tar cvf - / | \
      gpg --batch --disable-mdc --symmetric --cipher-algo AES256 \
        --passphrase-fd 3 3<$GPG_KEY >/dev/rmt0
Decrypting is similar, but with the --decrypt option:
    GPG_KEY=/root/.backup_key
    gpg --batch --quiet --no-mdc-warning --decrypt --passphrase-fd 3 \
      3<$GPG_KEY </dev/rmt0 | tar xvf -
Note how we obfuscate the location of the key file from the running command line, that users would be able to see with the ps command, by reading it in through file descriptor 3.

Encryption with Amanda

Amanda is one of the best open-source backup management packages available. It is very flexible, scalable, and very easy to customise and extend. The latest version (I used Amanda 2.5.1) has extensions and plugins to perform encryption with either the client or the server doing the hard work of encrypting the data.

Assuming you've already got Amanda installed and setup to run your backups, adding encryption for backups (... and decryption for restores) is very easy to do. On the Amanda server, configure a dumptype in /etc/amanda/DailySet1/amanda.conf (DailySet1 is the name for the backup set) to include an encryption plugin:

    define dumptype client-encrypt-ossl {
        global
        program "GNUTAR"
        comment "no compression and client symmetric encryption with OpenSSL"
        compress none
        encrypt client
        client_encrypt "/usr/sbin/amcrypt-ossl"
        client_decrypt_option "-d"
    }
The options are quite self-explanatory. This specifies that the backup set will use encryption and that encryption will be performed on the client system using /usr/sbin/amcrypt-ossl which is a simple shell script calling upon openssl to perform the encryption. Encryption is also performed on the client using the same script but with the "-d" option added.

Also on the Amanda server, change the dumptype for the directory on the client to be backed up, in /etc/amanda/DailySet1/disklist:

    gorgon.crypt.gen.nz /home client-encrypt-ossl
This will tell Amanda to use the client-encrypt-ossl when backing up /home on the backup client system gorgon.crypt.gen.nz.

On the client to be backed up, check that /usr/sbin/amcrypt-ossl is installed (it comes with Amanda v2.5 releases), and then configure an encryption key, or passphrase, for the backup:

    echo "amanda_encryption_key_86299993456" >/var/lib/amanda/.am_passphrase
    chown amandabackup:disk /var/lib/amanda/.am_passphrase
    chmod 700 /var/lib/amanda/.am_passphrase
Naturally, you should make up your own key to put into the .am_passphrase file, and remember to store a copy in a safe place. Now you're ready to run the backup. Check that everything is fine on the server:
    $ amcheck DailySet1
    Amanda Tape Server Host Check
    -----------------------------
    Holding disk /var/dumps/amanda: 2090172 KB disk space available, using 1987772 KB
    slot 3: read label `DailySet1-03', date `X'
    NOTE: skipping tape-writable test
    Tape DailySet1-03 label ok
    Server check took 0.316 seconds

    Amanda Backup Client Hosts Check
    --------------------------------
    Client check: 1 host checked in 0.449 seconds, 0 problems found

    (brought to you by Amanda 2.5.1p2)
    $ 
and then run
    amdump DailySet1
to perform the backup with encryption. The results will be emailed to the administrator specified in /etc/amanda/DailySet1/amanda.conf.

Whenever you change your backups, always test that it is working. In the case of setting up encryption you need to test that the data on tape is actually encrypted, and make sure that you can read it back as well!

You can change the cipher used for encryption by editing the encryption script /usr/sbin/amcrypt-ossl on the Amanda client system, the standard release uses aes-256-cbc which will be fine for most purposes.

Which algorithm?

It is important to choose the encryption algorithm, or cipher, carefully. Some commercial backup systems may not have the option to choose the cipher, but with OpenSSL there is a big choice of algorithms to choose from.

The version of OpenSSL I used for testing (v0.9.8a) supports all of these:

  aes-128-cbc    aes-128-ecb    aes-192-cbc    aes-192-ecb    aes-256-cbc
  aes-256-ecb    base64         bf             bf-cbc         bf-cfb
  bf-ecb         bf-ofb         cast           cast-cbc       cast5-cbc
  cast5-cfb      cast5-ecb      cast5-ofb      des            des-cbc
  des-cfb        des-ecb        des-ede        des-ede-cbc    des-ede-cfb
  des-ede-ofb    des-ede3       des-ede3-cbc   des-ede3-cfb   des-ede3-ofb
  des-ofb        des3           desx           rc2            rc2-40-cbc
  rc2-64-cbc     rc2-cbc        rc2-cfb        rc2-ecb        rc2-ofb
  rc4            rc4-40
Of course, some of these like base64 really aren't for encryption.

GPG supports a smaller range of ciphers, version 1.4.2 supports the following:

  3des, cast5, blowfish, aes, aes192, aes256, twofish
I recommend AES-256-CBC as a good option. AES is a fast algorithm, the 256-bit key size is long enough to be secure against attacks for a good few years and it is small enough to be fast. CBC designates the algorithm used to apply the AES to a data stream (remember, AES is a block cipher and only processes discrete blocks of information). Don't use the ECB block mode - this just isn't very strong in situations such as backups where the blocks of data at the start of the unencrypted stream are fairly predictable.

You should avoid plain DES because it has a relatively short 56-bit key length, triple-DES is suitable for using in backups but it is quite slow as will be shown in the next section.

Performance

Of course, you don't get anything for free. Encrypting large volumes of data can have quite a high impact on CPU usage depending on which algorithm you use, and performance impacts can be one of the major issues preventing companies from implementing encryption.

I performed some basic tests with OpenSSL using a number of ciphers to encrypt 8 gigabytes of data on a 1.0GHz Intel server with the output just going to /dev/null. The encryption time given in the following chart.

From these results, you cen see that triple-DES is really quite slow, and that AES-128 and AES-256 perform very well in comparison.

It can be hard to predict just how much effect encryption will have on a system. If the output device, such as a tape drive, is slow then the encryption process will get slowed down and you may not notice the extra load. If you are writing to a high-speed device, such as a dump area on a disk, then the CPU load from the encryption process could be very high. It is always best to run some load tests beforehand.

If server performance is an issue, you can look at interface devices which can perform encryption. A number of PCI-bus SCSI controllers can perform this, and there are also some hardware devices, such as the CryptoAccelerator PCI Card which can offload the encryption work into a hardware device.

There are also some devices which can connect between a host and a storage device which perform encryption, such as NeoScale's CryptoStor Tape devices. And of course, there are a number of tape drives which can perform encryption from vendors such as IBM and Sun. But be careful when using hardware appliances, and always plan for what steps you will need to take if that encrypting device stops working - make sure you can extract and reuse the encryption keys.

Also be aware then encrypting data will affect how well it can be compressed - whether you are using software or hardware compression. The ideal encryption algorithm will generate an output stream which is perfectly random, and cannot be compressed. So, if you want compression with encryption, it would be best to perform compression first and then apply encryption to the compressed data.

Another factor to be aware of is how well the encryption algorithm can cope with errors in the backup media. When using AES with CBC (Cipher Block Chaining) mode a single bit-error in the encrypted stream will cause the entire 256-bit block containing the error and the following block to be indecipherable. After those two blocks the decoder should be able to recover and continue decrypting the data although these two bad blocks could severely affect how well the system can recover files from the decrypted stream.

Key Management

You must bear in mind that with encryption the key is everything. If you start writing encrypted backups for a month or so, then lose they key you were using - then all of othat backup data is useless. Don't even think about trying to decode it without the key. So in this respect you must keep your key data just as secure as you would your backup media. Print out the key data on paper and store it in the company safe - and maybe an offsite secure location as well. If you have a very long key file, then save it to a number of different types of media (CD-ROM, USB drive, floppy disk) and store them somewhere safe.

Keys should be changed on a regular basis, and when employees and contractors leave the company. But be aware that losing a key will cause the loss of the backups made with that key. Always keep old keys in safe storage.

Always be aware and make plans for what you're going to do if the worst happens, such as if the data centre burns down. Will you be able to quickly recover your backups?

Conclusion

Encrypting backups is a very simple thing to do, and should be mandatory if your organisation is sending tapes offsite. If you are using commercial backup tools, like Netbackup, there should also be encryption options which you should investigate and make use of.

In this article, I've shown how to configure encrypted backups using OpenSSL and GPG encryption tools - if your backups are driven by plain shell scripts you should have no problems incorporating encryption. I've also shown you how to configure Amanda to use encryption - which is very straightforward in the latest versions of Amanda.

Be aware that some other types of backups, for example those which create a bootable recovery system like mksysb on AIX and Ignite on HP/UX may not be able to encrypt their data - so take special care when using these to make backups. On Linux mkcdrec is a very useful backup system which can create a bootable recovery CD and encrypt the filesystem data that is written onto the CD image - it prompts the user for the key when it performs recovery functions.

References