Backing up your database to Amazon S3

Filed under: development — Tags: , , — digitaljoel @ 10:17 pm

So now that you have your application running on an AWS EC2 instance, you need to backup the data somehow.  In my case, it’s a postgres database and I wanted to back it up into Amazon S3 within my same AWS account.  I wanted to have a backup for every day of the week, which would roll.  What I mean is that I would have a backup for Monday, and every Monday it would overwrite the previous Monday’s backup.  That way I would have a rolling 7 day backup. but not have a bazillion copies of the database that I have to manually get rid of.  Anyway, on to the code.

I wrote a little bash script that I then put into a cron job.  First, there’s a touch of setup to be done.  Wherever you are going to be running the job, you will need to install and configure s3cmd.  It’s a great little utility for hitting s3 from the command line.  The very simple instructions for configuring s3cmd are on that first page and shouldn’t take you more than 5 minutes.  I’ve run it on OSX Lion and also on my AWS instance and had no issues.

Next, is the bash script.  Here it is.


EXPORTFILE=`date +%A`.sql
BUCKET=<your bucket name>

$PGDUMP -f ./$EXPORTFILE -cb -F p --inserts -U <your user> <your database>


You’ll need to set PGDUMP to point to your pg_dump script if it isn’t in your path. Also set S3CMD to point to wherever you installed s3cmd.  If you prefer other options for pg_dump, or if you are using some other database, you can modify the $PGDUMP line to do whatever you need.

On Monday the script will create a file named Monday.sql and a compressed archive named Monday.sql.tgz.  It’ll then upload Monday.sql.tgz to your s3 account.  You could easily add another line at the end of this script to remove the exported file and the archive using


Finally, you’ll need to schedule this to be run once per day.  This can be done by running crontab -e and then using the following line in the crontab file:

0 2 * * * ~/

That will run the script every morning at 2.  You can change the hour for whatever fits your needs.

The final task for me is going to be creating a similar script that will run every week and keep the last 4 weeks of backup.  I’m planning to do that using %W on the date command to get the week of year and do some math using the week number in the file name to create the new file and remove the old ones.  I guess I’ll leave that as an exercise for the reader.


Amazon Web Services Alarm for HTTP Server

Filed under: development — Tags: , , , — digitaljoel @ 10:53 pm

So you’ve written an app and you’re hosting it on an AWS EC2 instance.  For whatever reason you have only the one server up with no load balancer in front of it.  You want to set an alarm in AWS so that if the server goes down you’ll know right away, but how can you do it?

I wrote a simple bash script to ping a special URL in my web application.  The response from the URL is simply the text “healthcheck ok” with a 200 response code.  The script checks for that text.  If it exists in the response, then it sends a 1 up to AWS as a custom metric.  If it doesn’t, then it sends a 0.

while :
  healthcheck=`curl --connect-timeout 5 --max-time 7 --fail --insecure --silent https://localhost/healthcheck`
  if [ "healthcheck ok" = "$healthcheck" ]
  mon-put-data --metric-name HttpHealthCheck --namespace YourNamespace --dimensions "server=prod" --value $stat
  sleep 60

In order for the script to run, you’ll need to have done all the authentication setup for the AWS scripts and ensure you have a version of them that includes the mon-put-data script.  For testing, you can run the curl command on the command line.  You can do the same with mon-put-data.

In my experience, it took a few minutes for the custom metric to show up the first time I sent it.  Once it settles in you should be able to select it from the metrics in CloudWatch.  The final step is to setup the alarm.

You should be able to set an alarm to go off when the value of the metric is <= 0.  I tested it by shutting down my web server and I got the alarm notification within about a minute.

If your health check isn’t started (which you can do with $nohup ./ &) then you won’t get samples and in my test no alarm was sounded.  So, I set another alarm.  For any metric, you can set an alarm based on the value, or based on the samples.  Just choose the “samples” statistic from the drop down.  Set the alarm to go off if samples <= 0.  Also add another action and set it to go off on INSUFFICIENT_DATA, meaning that there are not enough samples, which likely means your script wasn’t started, or has failed.

Once your app is super popular, you can look at the load balancer, which I believe allows for setting alarms based on HTTP response times etc. but I think this’ll do until I get there.

Blog at