A consortium of industry leaders and researchers united in the common cause of understanding and coping with the immense levels of complexity involved in the operation of critical digital services.
In an homage to the WWII phrase SNAFU (situation normal, all f***ed-up), CSEL’s SNAFU consortium recognizes the in the world of business-critical digital services, the complexity and scale of the system makes difficulties, issues, and outages normal events. Skilled and talented individuals working at the sharp-end (closest to system operations) often take on the role of SNAFU catchers. These people have, either formally or informally made it part of their work to monitor for weak-signals of impending issues and take action (often collaborating with other specialist) to ensure continued system operation.
In Cycle 1, collaborations with our partners regarding real breakdowns they have experienced led to the identification and discussion of 6 themes in coping with complexity:
- Capturing the value of anomalies through postmortems
- Blame versus sanction in the aftermath of anomalies
- Controlling the costs of coordination during anomaly response
- Supporting work through improved visualizations
- The strange loop quality of anomalies
- Dark debt
The Stella Report discusses these findings in greater detail.
Cycle 2, underway, is focused on the costs of coordination.
Reach out to the consortium team if your company is interested in joining Cycle 3.