www.bundesbrandschatzamt.de
Babblings about Systems Administration.

Amanda backup and AWS S3

After I had my DDS Streamer at home running for a while again with AMANDA, the Advanced Maryland Automatic Network Disk Archiver, I was considering using it at work as well. Always looking at cost optimization our Vertica clusters looked like good candidates. As most DBAs I fear loosing data. Following the safe route I use vbr hardcopy backup followed by an rsync to a dedicated backup EBS volume. From those we take snapshots via Amazon EBS Snapshot Lifecycle.

It works and as nothing is written to the drives during snapshot time you can safely assume that you get healthy snapshots. But were is light there is shadow:

With a multi-node cluster the ebs volumes count up and get quite expensive. Sometimes you figure problems in your database weeks later. With no longterm backup getting the data back can be impossible. With S3 it is easy to store your backup in a different region. A big safety net even though i haven’t experienced regional outages yet. The 2012 Christmas Eve Outage is probably one of the most famous ones. December 2016 I’ve seen slow EBS Volumes for myself.

Following my intro there is not that much additional work for using Amanzon S3.

If you run a classical setup of one Amanda server and multiple clients you might run into limitations. Lets say you want to backup something like a Vertica cluster. If you have 10 nodes each with 1TB of data to backup all of this has to travel through one ec2 instance on its way to S3. This can be easily 10 hours of backup time. Consider the cluster gets bigger over time and your backup time grows even more. Sure, a full backup is not typical and most of the backup might finish in 2 hours or even less. Now comes the BUT: In case of a complete cluster failure you need the full time for recoverying the cluster. I don’t know about you but for me that time is unacceptable. If each node runs it’s own Amanda server the backup might be finished in an hour or two. Far better as this approach scales as well.

If server and client reside on the same host we need an easy way of recovering the server information. A separate Amanda configuration for the Amanda server backup itself and one for the rest of the data is the solution.

After starting an empty ec2 instance all it’s need is something like this Except script:

#!/bin/expect -f
if { $argc == 0 } {
    puts "\nusage: $argv0 hostname\n\n"
} else {
set cmd [list grep tpchanger /etc/amanda/amanda-server/amanda.conf | sed -e "s/tpchanger \"chg-multi:s3:\\(.*\\)\\/\[^\\/\]*\\/amanda-server\\/slot.*/\\1/"]
    set S3 [ exec {*}$cmd ]
    set cmd [list grep S3_ACCESS_KEY /etc/amanda/amanda-server/amanda.conf | awk "\{print \$3\}" | sed -e "s/\"//g" ]
    set ACCESS [ exec {*}$cmd ]
    set cmd [list grep S3_SECRET_KEY /etc/amanda/amanda-server/amanda.conf | awk "\{print \$3\}" | sed -e "s/\"//g" ]
    set SECRET [ exec {*}$cmd ]
    set env(AWS_ACCESS_KEY_ID) $ACCESS
    set env(AWS_SECRET_ACCESS_KEY) $SECRET
    set cmd [list aws s3 ls s3://$S3/[lindex $argv 0]/amanda-server/ ]
    set cmd [ list aws s3 ls s3://$S3/[lindex $argv 0]/amanda-server/ ]
    set AWS_LS [exec {*}$cmd ]
    send_user "$AWS_LS\n\n"
    send_user "enter slot-????-mp.data:\n"
    gets stdin tarfile
    set cmd [ list aws s3 cp s3://$S3/[lindex $argv 0]/amanda-server/$tarfile /tmp/ ]
    exec {*}$cmd
    set cmd [ list aws s3 cp s3://$S3/[lindex $argv 0]/amanda-server/$tarfile /tmp/ ]
    exec {*}$cmd
    set cmd [ list tar -xf /tmp/$tarfile --directory=/etc --exclude='amanda-client.conf' ./amanda ]
    exec {*}$cmd
    set cmd [ list tar -xf /tmp/$tarfile --directory=/etc  ./amanda-security.conf ]
    exec {*}$cmd
}

Now the instance knows everything about the backups it has taken in the past and you can continue with the regular amanda commands. One of the benfits of not using a commercial backup software with proprietary backup files.

But lets have a closer look into the configuration files!

If you manually create the s3 bucket there is no need to give the amanda amanzon user privileges outside of that bucket.

org        "amanda-server"
# mailto     "root"
dumpuser   "amandabackup"
inparallel 1
dumporder  "sssS"
taperalgo  first
displayunit "g"
netusage 8000 Kbps
dumpcycle 1 weeks
runspercycle 7
tapecycle 10 tapes
etimeout 300
dtimeout 1800
ctimeout 30
bumpsize 20 Mb
bumppercent 20
bumpdays 1
bumpmult 4
device_output_buffer_size 1280k

autoflush yes
runtapes 1

tapedev "my_s3"
tapetype S3

maxdumpsize -1
labelstr "^amanda-server-[0-9][0-9]*$"
autolabel "amanda-server-%%%%" empty

amrecover_changer "changer"

define changer my_S3 {
  tpchanger "chg-multi:s3:your-backup-bucket/path/slot-{01..10}" # number of tapes in your "tapecycle"
  device-property "S3_BUCKET_LOCATION" "eu-west-1"
  device-property "S3_ACCESS_KEY" "foo"
  device-property "S3_SECRET_KEY" "bar"
  device-property "NB_THREADDS_BACKUP" "6"
  device-property "NB_THREADS_RECOVERY" "10"
  device-property "S3_MULTI_PART_UPLOAD" "YES"
  device-property "S3_SSL" "YES"
  changerfile "s3-statefile"
}

holdingdisk hd 1{
  comment "main holding disk"
  directory "/opt/amanda"
  use -100 Mb
  chunksize 1Gb
}

infofile "/etc/amanda/amanda-server/curinfo"
logdir   "/etc/amanda/amanda-server"
indexdir "/etc/amanda/amanda-server/index"

define interface local {
    comment "a local disk"
    use 8000 kbps
}

define application-tool app_amgtar {
    comment "amgtar"
    plugin  "amgtar"
    property "XATTRS" "YES"
}

define dumptype normal {
   global
   program "APPLICATION"
   application "app_amgtar"
   encrypt none
   compress client best
   index yes
   exclude list ".amanda.excludes"
}

define dumptype normal-archive {
   normal
   record no
   dumpcycle 0
}

define dumptype normal-uncompressed {
   normal
   compress none
}

define dumptype all {
   normal
   exclude ""
}

define dumptype all-archive {
   normal-archive
   comment "backup all. no excludes. no incremental."
   exclude ""
}

define dumptype all-archive-uncompressed {
  all-archive
  compress none
}

define dumptype all-uncompressed {
   normal-uncompressed
   exclude ""
}

define dumptype normal-uncompressed-archive {
   normal-uncompressed
   exclude ""
   record no
}

define dumptype normal-server-encrypt {
   normal
   comment "dump with server symmetric encryption"
   encrypt server
   server_encrypt "/sbin/amcrypt"
   server_decrypt_option "-d"
}

define dumptype normal-server-encrypt-archive {
   normal-server-encrypt
   exclude ""
   record no
}

define tapetype S3 {
    comment "S3 pseudo-tape"
    length 500 gigabytes
    part_size 50 gigabytes
    part_cache_type none
    blocksize 10 megabytes
}

the disklist for the server:

localhost /etc/ all-archive

As mentioned in my previous post the postinstall scripts contain a bug for generating the encryption keys.

This will genrate a proper encryption key:

# get_random_lines 65

lines=65
pad_lines=`expr $lines + 1`
block_size=`expr $pad_lines \* 60`

 dd bs=${block_size} count=1 if=/dev/urandom 2>/dev/null | \
     base64 | \
     head -$pad_lines | \
     tail -$lines >~amandabackup/.gnupg/am_key_new

 gpg2 --homedir ~amandabackup/.gnupg \
      --no-permission-warning \
      --armor \
      --batch \
      --symmetric \
      --passphrase-file ~amandabackup/.am_passphrase \
      --output ~amandabackup/.gnupg/am_key_new.gpg \
      ~amandabackup/.gnupg/am_key_new

Keep in mind that you need of copy of that file to read your backupfiles!

a disklist using the encryption might look like:

localhost /opt/vertica normal-server-encrypt
localhost /usr/local/data all-uncompressed
localhost /home normal-server-encrypt

Amanda cycles through it’s tape which means an automated transfer to glacier via s3 bucket configuration is not an option. Instead

device_property “TRANSITION-TO-GLACIER” “1”

could be used together with a script after each run:

LABELSTR=`amgetconf server-config labelstr | sed -e 's/[\^\$"]//g'`
TAPELIST=`amstatus server-config | egrep $LABELSTR | awk '{print $9 }'`
for TAPE in ${TAPELIST}; do
  amadmin server-config no-reuse $TAPE
done