Chaos Monkey is a tool built by Netflix that randomly terminates instances within your infrastructure. The reason for doing this is to help ensure that your applications continue to run in times of instance failures.
By introducing “automated failure” into your infrastructure, you are forcing your DevOps, IT, and developers to plan for failure. This is one of the key mantras of cloud computing. By forcing an instance failure at known and/or friendly times, your team can react if your application does not behave positively.
Recently, we announced a new action called “Terminate EC2 Instances”. Using this new action, you can easily implement your own version of Chaos Monkey. Today, we’ll walk you through this process.
If you haven’t already done so, sign-up for a Skeddly account. It’s free to try this out with our 30-day free trial.
First, we’ll need some AWS credentials to use for our Chaos Monkey.
Once you have your AWS credentials registers, you can create your Chaos Monkey action.
With these values, we’ll scan the AWS account for EC2 instances with an EC2 tag named “skeddly:chaos-monkey” that has a value of “yes”. From those EC2 instances found, we’ll ensure that 1 is kept from termination. Of the remaining instances, at least one will be terminated with a 50% probability that each of the other EC2 instances will be terminated.
You can play with the probability percentage to have more or less instances terminated. You can choose to set this value to 0% which will then only have the minimum (1 in this case) EC2 instances terminated during each execution.
Warning: we are not creating a final AMI image in this example. This assumes that your EC2 instances do not contain any non-recoverable data. If you would like to create a final AMI image, select “Yes” in the “Create Final AMI Image” field.
Here are my settings:
We need our AWS credentials to allow Skeddly to work in our AWS account. We do this by giving our AWS credentials an IAM policy with the required commands.
Since we want to only terminate instances with certain tags, we need to add those tags to our Auto Scaling group.
The new tag will only be added to new EC2 instances. So you’ll need to manually terminate the existing ones in your group so that new instances with the tags are created.
Once the new instances are created, you can move on to step #5.
At this point, you can wait for your scheduled execution time to see your EC2 instances terminate. You can also trigger the action to run manually for a more instant gratification.
More information about Chaos Monkey can be found at: