Engineering
Incident Response
Responding to and cataloging high-severity incidents is a critical skill to develop for any software organization.
Incident Response is part of our Engineering Career Leveling Framework. Explore next steps in your career from this industry-standard model.
Curated Learning Resources
- Incident Management HandbookA comprehensive resource from the GitLab team outlining many aspects of incident response including: • Key Roles and Responsibilities • Runbooks • Tracking and Communication
- If Dr House did DevOpsDifferential Diagnosis (DDx) is a useful framework for software engineers to use when responding to incidents. It is based on the process used by Dr. House and his team in the TV series, where they huddle around a whiteboard to list symptoms and possible causes, and prioritize the list of causes. DDx can help make decisions in a stressful situation and train less-experienced engineers in incident response. It is also more fun to think of incident response as solving a mystery, rather than just responding to a snafu. DDx can help rule out simple, common explanations, gather data, list possible causes, and prioritize the list of causes. Treating symptoms can help uncover the root cause of the incident.
- Gitlab's Incident Classification SystemThe GitLab team's system for classifying severity and urgency of incidents.