DigitalJoel

2011/08/10

Backing up your database to Amazon S3

Filed under: development — Tags: , , — digitaljoel @ 10:17 pm

So now that you have your application running on an AWS EC2 instance, you need to backup the data somehow.  In my case, it’s a postgres database and I wanted to back it up into Amazon S3 within my same AWS account.  I wanted to have a backup for every day of the week, which would roll.  What I mean is that I would have a backup for Monday, and every Monday it would overwrite the previous Monday’s backup.  That way I would have a rolling 7 day backup. but not have a bazillion copies of the database that I have to manually get rid of.  Anyway, on to the code.

I wrote a little bash script that I then put into a cron job.  First, there’s a touch of setup to be done.  Wherever you are going to be running the job, you will need to install and configure s3cmd.  It’s a great little utility for hitting s3 from the command line.  The very simple instructions for configuring s3cmd are on that first page and shouldn’t take you more than 5 minutes.  I’ve run it on OSX Lion and also on my AWS instance and had no issues.

Next, is the bash script.  Here it is.

#!/bin/bash

PGDUMP=pg_dump
EXPORTFILE=`date +%A`.sql
COMPRESSEDFILE=${EXPORTFILE}.tgz
BUCKET=<your bucket name>
S3CMD=~/bin/s3cmd-1.0.1/s3cmd

$PGDUMP -f ./$EXPORTFILE -cb -F p --inserts -U <your user> <your database>
tar -czf ${COMPRESSEDFILE} ${EXPORTFILE}

$S3CMD put ./${COMPRESSEDFILE} s3://${BUCKET}

You’ll need to set PGDUMP to point to your pg_dump script if it isn’t in your path. Also set S3CMD to point to wherever you installed s3cmd.  If you prefer other options for pg_dump, or if you are using some other database, you can modify the $PGDUMP line to do whatever you need.

On Monday the script will create a file named Monday.sql and a compressed archive named Monday.sql.tgz.  It’ll then upload Monday.sql.tgz to your s3 account.  You could easily add another line at the end of this script to remove the exported file and the archive using


rm $EXPORTFILE $COMPRESSEDFILE

Finally, you’ll need to schedule this to be run once per day.  This can be done by running crontab -e and then using the following line in the crontab file:


0 2 * * * ~/backupdb.sh

That will run the script every morning at 2.  You can change the hour for whatever fits your needs.

The final task for me is going to be creating a similar script that will run every week and keep the last 4 weeks of backup.  I’m planning to do that using %W on the date command to get the week of year and do some math using the week number in the file name to create the new file and remove the old ones.  I guess I’ll leave that as an exercise for the reader.

Advertisements

Create a free website or blog at WordPress.com.