So you’ve written an app and you’re hosting it on an AWS EC2 instance. For whatever reason you have only the one server up with no load balancer in front of it. You want to set an alarm in AWS so that if the server goes down you’ll know right away, but how can you do it?
I wrote a simple bash script to ping a special URL in my web application. The response from the URL is simply the text “healthcheck ok” with a 200 response code. The script checks for that text. If it exists in the response, then it sends a 1 up to AWS as a custom metric. If it doesn’t, then it sends a 0.
#!/bin/bash while : do stat=0 healthcheck=`curl --connect-timeout 5 --max-time 7 --fail --insecure --silent https://localhost/healthcheck` if [ "healthcheck ok" = "$healthcheck" ] then stat=1 fi mon-put-data --metric-name HttpHealthCheck --namespace YourNamespace --dimensions "server=prod" --value $stat sleep 60 done
In order for the script to run, you’ll need to have done all the authentication setup for the AWS scripts and ensure you have a version of them that includes the mon-put-data script. For testing, you can run the curl command on the command line. You can do the same with mon-put-data.
In my experience, it took a few minutes for the custom metric to show up the first time I sent it. Once it settles in you should be able to select it from the metrics in CloudWatch. The final step is to setup the alarm.
You should be able to set an alarm to go off when the value of the metric is <= 0. I tested it by shutting down my web server and I got the alarm notification within about a minute.
If your health check isn’t started (which you can do with $nohup ./healthcheck.sh &) then you won’t get samples and in my test no alarm was sounded. So, I set another alarm. For any metric, you can set an alarm based on the value, or based on the samples. Just choose the “samples” statistic from the drop down. Set the alarm to go off if samples <= 0. Also add another action and set it to go off on INSUFFICIENT_DATA, meaning that there are not enough samples, which likely means your script wasn’t started, or has failed.
Once your app is super popular, you can look at the load balancer, which I believe allows for setting alarms based on HTTP response times etc. but I think this’ll do until I get there.