Exception Perceptions: Gremlin + Chaos Engineering
On this episode of Exception Perceptions, Tammy Butow, Principal SRE at Gremlin and creator of the popular O’Reilly Chaos Engineering Bootcamp, helps us organize our thoughts on Chaos Engineering. Watch the episode, and read more of Tammy’s suggested practical ways to perform Chaos Engineering. Then go and get all of her Chaos Engineering resources.
Practice Automated Chaos Engineering from Day 1
One of the many strengths of Chaos Engineering with Gremlin is that the Gremlin team have created these tools with developer and operator workflows in mind. Installation is fast, and Gremlin is built with safety, reliability, and security in mind.
When an engineer joins your team, they should be able to quickly and easily understand what Chaos Engineering experiments are already scheduled to run by following these four steps:
Install the Gremlin agent on your development environment.
View the upcoming scheduled attacks using the UI or API.
Understand the tools to monitor and observe the upcoming Chaos Engineering experiments.
Be empowered to run additional chaos engineering experiments in your own local environment.
There’s so much you can configure and automate in Gremlin, but here are four tips I’ve adopted for making it work the best for typical development workflows.
4 Tips for Chaos Engineering In Your Development Workflow
The Gremlin App is where most developers start exploring Chaos Engineering with Gremlin. The next step is often the Gremlin CLI or Gremlin API. The Gremlin CLI helps with Chaos Engineering for local development and testing for sure, but the real workflow savings are in using the Gremlin API and automation tooling (e.g., scheduling and integration with your own application).
Tip 1: Focus your Gremlin Attack organization and naming on developer understanding
You should be able to easily see what Chaos Engineering attacks are scheduled to run at any time. You can use the Gremlin App, Gremlin CLI, and Gremlin API to do this.
Gremlin App: View scheduled attacks
Gremlin API: View scheduled attacks
curl -X GET "https://api.gremlin.com/v1/schedules/active" \
-H "Authorization: $bearertoken" \
-H "accept: application/json"
Tip 2: Use monitoring and observability tools for active Chaos Engineering experiments
There is a range of tools that you can use to help you understand your Chaos Engineering experiments. These include:
Sentry.io
htop
top
tshark
Tip 3: Share the Chaos Engineering experiments you have already run in your PRs
To assist your code reviewer, it is useful to include the Chaos Engineering experiments you have run in your development and testing environments for your code change. An example can be found below:
Tip 4: Automate your Chaos Engineering
When you decide on a new Chaos Engineering experiment, you should then automate it using the Gremlin API. To learn more about using the API, read the Gremlin API docs and try out the Gremlin API playground.
Here is an example of a Gremlin CPU attack:
# Add 1 core of CPU load to a random host for 30 seconds
curl --header "Content-Type: application/json" \
--header "Authorization: $bearertoken" \
https://api.gremlin.com/v1/attacks/new \
--data '
{
"command": { "type": "cpu", "args": ["-c", "1", "--length", "30"] },
"target": { "type": "Random" }
}'
Now that you know all of my secrets best practices, get a Gremlin Free account by signing up here:
What Chaos Engineering tips have you found useful? Please share with me and the community on the Chaos Engineering Slack.