Thoughts from Talk: Debugging under Fire
In our company’s lunch and learn, we watch Brian Cantrill’s talk about Joyent’s outage in May 2014.
To get background on the situation - the original postmortem for the outage is here
Slides for the talk are here
This talk is fantastic and I recommend re-watching it. Getting into the mind of a person at an outage
Some thoughts I had during the talk:
- For Three Mile Island light switch meaning - I remember in a class which has specific design rules control systems designs. Something tells me those rules were “ignored” in this particular case.
- As to the death of nuclear power - accidents like Three Mile Island do not help. At the same time, I learned safe nuclear power is possible - US Navy’s submarines!
- “Software is synthetic” - it runs on a mathematical machine.
- All software is less than 60 years old.
- Debugging is a process of asking questions and getting answers.
- NOT: ““guessing for precise answer”
- In a way, it’s reality realignment.
- “Do it right the first time” - I’m all for that!
- But…What does “right” mean? What does “first” mean?? Who defines those?? :-)
- DevOps - unnatural joining of those as during a prod outage, you
will know who is who:
- Developers: debug, debug (to find root cause!)
- Operators: restart, restart (to get things operational!)
- In any prod outage best thing to do: STOP, THINK, ASK, ACT SLOWLY
- Create a culture of debugging - create code that’s easy to debug.
Overall, I loved this talk, even the second time. Brian’s emotions on stage are visceral and I learned from his outage.