Red Green Repeat Adventures of a Spec Driven Junkie

Thoughts from Talk: Debugging under Fire

In our company’s lunch and learn, we watch Brian Cantrill’s talk about Joyent’s outage in May 2014.

To get background on the situation - the original postmortem for the outage is here

Slides for the talk are here

This talk is fantastic and I recommend re-watching it. Getting into the mind of a person at an outage

Some thoughts I had during the talk:

  • For Three Mile Island light switch meaning - I remember in a class which has specific design rules control systems designs. Something tells me those rules were “ignored” in this particular case.
  • As to the death of nuclear power - accidents like Three Mile Island do not help. At the same time, I learned safe nuclear power is possible - US Navy’s submarines!
  • “Software is synthetic” - it runs on a mathematical machine.
  • All software is less than 60 years old.
  • Debugging is a process of asking questions and getting answers.
    • NOT: ““guessing for precise answer”
    • In a way, it’s reality realignment.
  • “Do it right the first time” - I’m all for that!
    • But…What does “right” mean? What does “first” mean?? Who defines those?? :-)
  • DevOps - unnatural joining of those as during a prod outage, you will know who is who:
    • Developers: debug, debug (to find root cause!)
    • Operators: restart, restart (to get things operational!)
  • In any prod outage best thing to do: STOP, THINK, ASK, ACT SLOWLY
  • Create a culture of debugging - create code that’s easy to debug.

Overall, I loved this talk, even the second time. Brian’s emotions on stage are visceral and I learned from his outage.