This Is How You Create “In-Between” AWS CloudWatch Alarms

Here is all you need to know, for those who are *not* AWS experts but still need to complete their tasks ;)

One of my last tasks at work was adding “in-between” alarms to our system, and considering we already have alarms, it was supposed to be an easy one to complete. However, it turns out that the straight-forward solution doesn’t work in this case. Eventually, I found a tutorial with the correct solution and followed it, but I didn’t find it as beginner-friendly as I needed.

Therefore, after fully understanding, I created this tutorial. This tutorial will teach you how to solve the problem, including the thinking process, and with emphasis on the path from what you know to what you need in order to create your “in-between” alarm.

Why Do I Need Alarms?

In order to answer that, let’s look at an example from real life: I used to work at a McDonald’s restaurant near a movie theatre. While it isn’t rare that a restaurant will be out of a certain item on the menu, you will never hear “sorry, we don’t have McDonald’s’ legendary french fries at the moment”.

How come? Because besides having years of data that helps them prepare for busy days and events, they also have procedures for facing a mid-day shortage of supplies, which means they make sure to fill it in without you noticing something happened.

Now, consider this scenario: Let’s assume it’s a weekend during a summer break, and that it’s also the first day out of lockdown and there is a kids’ movie premiere. It doesn’t get any busier than that! This McDonald’s branch is going to have more traffic than usual, and they did not prepare for it.

Is it smart to wait for the moment when there are zero potato bags left, to start looking for more? No, because that will impact the customers. It takes time to call a (less busy) neighbor branch, arrange a pick-up, and make the fries, and during which there will be no fries to sell.

So let’s say McDonald’s operates an alarm to notify the manager when the number of potato bags is less than or equal to 2. In that case, the manager has about an hour to get more bags, which is just the right amount of time.

Let’s see some code

We will start by creating the first alarm that we all agree should exist: Notify the staff when the amount of potato bags is equal to or below threshold = 2.

Here are the properties of the CloudWatch alarm

Now, let’s assume that our McDonald’s branch is GraftonStreet and the metric that monitors the amount of potato bags in our branch is called PotatoBags.

Our alarm creation in the code will look something like the following:

Of course, the alarm can also be created manually in the AWS console, in the CloudWatch section.

So, we have an alarm that fires whenever the number of potato bags in the PotatoBags metric is equal to or below 2.

That is a nice solution, but it is not casualty free. Sure, it would work if you have an available employee to send. In that case — calling the neighboring branch, driving there, picking up the bags and making a bunch of fries can be done in an hour.

But what if you don’t have one? What if the kids’ movie just ended and everyone is on their way to your McDonald’s? The branch is about to get so busy, that sending someone now will still impact the customers!

It may sound like a small thing, but for big corporate businesses, it is a huge loss to be down or out of service, even for a few minutes.

Therefore, an even better solution would be to be notified when there are 2 bags left and also to be notified when there are 4 bags left. This will give two hours’ notice, which is just what is needed. This is the power of an “in-between” alarm. That way, you can calmly wait for the peak to calm down and then send one of the employees.

So What Is The Problem? Just Add Another Alarm

Unfortunately, the solution is not that straightforward. Here are two intuitive solutions that I thought of, and why they don’t work.

Incorrect intuitive solution #1: Add another alarm with a higher value threshold to notify

I started my journey by adding another alarm with a higher threshold and thinking to myself “wow, that is an easy task!”

It looked something like this:

Besides the threshold and the name, this alarm is identical to the previous one, which means that we’ll get notified when we reach 4 bags! But what would happen when we reach 2 bags?

For every value below 2, we will get notified twice! Both by the red and by the yellow alarm. That might solve the problem of “get a yellow alarm for 2–4 bags and a red alarm for 0–2 bags”, but it’s not good enough. It’s kind of like having a traffic light that lights both red and green when it’s your turn to move. This solution is not ideal.

Incorrect intuitive solution #2: Add another alarm with an “in-between” ComparisonOperator and two thresholds

After realizing that my first solution doesn’t work well enough when we reach the red zone, I understood I need an alarm for “in-between” values. I need the yellow alarm to not fire when the value is below 2. Well, if I need an “in-between” alarm, I’ll just use an “in-between” comparison operator!

At this point, I also thought to myself “wow, that’s an easy task!”

So I went to check the allowed values for the ComparisonOperator, and the options are: GreaterThanOrEqualToThreshold | GreaterThanThreshold | GreaterThanUpperThreshold | LessThanLowerOrGreaterThanUpperThreshold | LessThanLowerThreshold | LessThanOrEqualToThreshold | LessThanThreshold

Did you notice the problem? There is no InBetweenThresholds! How can that be?! Isn’t it a basic need?!

Well, there is an option for LessThanLowerOrGreaterThanUpperThreshold, but it is the exact opposite of what we need! Ok… maybe that’s a lead! Oh no, there isn’t any option to integrate Not(...) into the ComparisonOperator, so that’s not the direction.

The Correct Solution: Metric Math

Metric math enables you to query multiple CloudWatch metrics and use math expressions to create new time series based on these metrics.

Here is our new and improved yellow alarm:

Some of the properties we already had in the basic alarm, just in a different location, and some were added to the basic alarm, in order to use metric math and reach the correct alarm structure to solve our problem.

From basic alarm to using metric math: Let’s talk about the differences between the two alarms

1. Adding a second threshold

6 const LOWER_THRESHOLD = 2
7 const UPPER_THRESHOLD = 4

2. Created a metrics array that contains two MetricDataQueries

14 metrics: [...]
  • If you specify the Metrics parameter, you cannot specify MetricName, Dimensions, Period, Namespace, Statistic, ExtendedStatistic, or Unit. That is why the MetricName, Dimensions, Period, Namespace, Statistic properties were moved into the metrics array.
  • An alarm MetricDataQuery is composed of:

3. Metric #1 is the original metric we monitor:

Let’s talk in more depth about id, metricStat, and returnData:

MetricDataQuery id

Note that the MetricDataQuery id has validation limits: Firstly, the valid characters for MetricDataQuery id are letters, numbers, and underscore. Second, the first character must be a lowercase letter. Third, the minimum length is 1 and the maximum length is 255.

A way to ensure this is by using RegEx: /^[a-z][A-Za-z_0-9]{1,255}$/

If the original metric name you monitor starts with Upper case, you can create a lowercase version by using:

let LOWER_CASE_METRIC_NAME = METRIC_NAME.charAt(0).toLowerCase() + METRIC_NAME.slice(1);

You don’t want to put the whole name in lowercase because you may need the uppercase letters for camelCase.

MetricDataQuery metricStat

Notice that metricStat has properties that are similar to the ones in the basic alarm:

Below is a snapshot from the code of the basic alarm we’ve seen in the beginning. Lines 12–19 are the properties that are now inside metric: {…}.

7  new cloudwatch.Alarm(this, "potatoBagsRedAlarm", {
...
12 metricName: `${METRIC_NAME}`,
13 namespace: "AWS/Logs",
14 dimensions: [
15 {
16 name: "BranchName",
17 value: `${BRANCH_NAME}`
18 }
19 ],

period: Duration.minutes(5).toSeconds(),
statistic: "Minimum",
...
})

MetricDataQuery returnData

Notice that we are defining returnData: false.

When you create an alarm based on a metric math expression, specify True for this value for only the one math expression that the alarm is based on. Therefore, you must specify False for ReturnData for all the other metrics and expressions used in the alarm.

As you will see in a bit, this one is not the metric the alarm is based on. However, the metric the alarm is based on uses this metric, and this is why the metric is here, but the return data is false.

4. Metric #2 is the one math expression that the alarm is based on:

Id and label

Those are yours to choose how to fill, they don’t impact the performance (that is, if you choose a valid id!)

Expression

5 expression:
6 `IF(${LOWER_CASE_METRIC_NAME} > ${LOWER_THRESHOLD} AND
7 ${LOWER_CASE_METRIC_NAME} <= ${UPPER_THRESHOLD}, 1, 0)`,

We are using an IF expression, which in our case is:

IF(potatoBags <= 4 AND potatoBags > 2, 1, 0)

Notice that potatoBags is the id we used in the previous metric! The one where we said returnData: false. This expression says that if the potatoBags metric is between 2 and 4 return True, else return False.

ReturnData

8 returnData: true

because this is the metric our alarm is based on.

How Can I Test My Alarms?

Only in the middle of working on this task, I realized I don’t know how to test alarms in general, not just the metric math ones.

If you have a basic alarm, one way to test it is by switching the ComparisonOperator to the opposite one. As a result, assuming your data is mostly green (OK), the green values will be viewed as red and that will fire your alarm.

Another way to test is to put metric data through the AWS CLI. Let’s say I want to see how my alarms react to 3 bags of potatoes. In order to do that, I’ll be using the following command:

aws cloudwatch put-metric-data --metric-name PotatoBags --namespace “AWS/Logs” --unit Count --value 3 --dimensions BranchName=GraftonStreet

Notice that we didn’t specify an account nor a region. That is because it is already configured. If it’s not, or not configured correctly (AKA “I wrote this command, why don’t I see anything in CloudWatch?”), write aws configure and you will see something like this:

AWS Access Key ID [*******ABCD]: *leave empty, press Enter*AWS Secret Access Key [********AbCd]: *leave empty, press Enter*Default region name [None]: eu-west-1 *enter the correct region*Default output format [None]: *leave empty, press Enter*

That’s it!

In this blog-post, we talked about the importance of alarms and their creation in CloudWatch. More particularly, we learned how to create an “in-between” alarm and how to test it.

With that, we saved the day and no one will ever have to suffer from a lack of french fries! Obviously, being a software developer is a very important job; some would say it’s life-saving!

I hope you enjoyed reading this article and learned something new! Want to learn more? Click here and read about Yak Shaving — a developer’s nightmare!

I would love to hear your thoughts, here are the ways to contact me:
Facebook: https://www.facebook.com/cupofcode.blog/
Instagram: https://www.instagram.com/cupofcode.blog/
Email: cupofcode.blog@gmail.com

Resources:

I write about things that interest me as a software engineer, and I find interest in various subjects :) || Come visit at www.cupofcode.blog