IMPLEMENTING CHAOS ENGINEERING
A typical chaos experiment process involves identifying, prioritizing, and defining a steady state of the business function where resilience is needed. A chaos team then identifies failure scenarios, monitors key metrics, defines the last radius of the experiment, gets a buy-in from the business, communicates to the stakeholders, and plans game day for resilience testing.
On game day, the team conducts the experiment, and performs blameless incident analysis and a postmortem of the experiment. The chaos team then identifies the action and resilience pattern required for the resolution. As part of the next step, they coordinate with the team concerned to test and validate the resilience pattern in a test or preproduction environment, before applying the changes to the production environment, and plan for the next game day.
There are many commercial and opensource tools available to conduct chaos experiments.
Depending on the IT environment and internal capabilities, organizations can choose between commercial and opensource tools to carry out these experiments.
Businesses should leverage chaos engineering to build resilience and deliver definitive value to customers.