“There are three types of lies: Lies, damned lies, and service status pages”
I had a great time at DevOps Days 2018 in Wellington earlier this week. Wellington really put on the weather (and craft beer), and the organisers ran an excellent conference. I really appreciated the decision to allow opting out of the conference T-Shirt in favour of a donation to charity. As much as I love my swag I really don’t need any more black T-Shirts with tech logos!
Jeff Smith opened the proceedings with Moving from Ops to DevOps: Centro’s Journey to the Promiseland, which had some great practical tips on measuring toil and making sure you’re asking the right questions. François Conil’s talk Monitoring that cares spoke about the end of user based monitoring, and humane, actionable alerts. I also really enjoyed Ryan McCarvill’s fascinating talk Fighting fires with DevOps on the challenges of fitting out fire trucks for Fire & Emergency NZ with a modern cloud based incident response system.
Matt Brown held a really interesting workshop on risk prioritisation from the point of view of a SRE. By calculating the expected ‘cost’ of an outage in downtime minutes per year, we can compare these against our error budget to determine if they fit our risk model.
Jessica DeVita closed out the conference with a deep dive into her research on retrospectives. After an incident we tend to put most of our attention ‘below the line’ of our systems diagrams, and far less ‘above the line’, at the people and processes operating those systems. One cannot exist without the other, so we need to strive for a culture of blamelessness and empathy in our incident response processes in order to learn about and improve the resiliency of our systems - above and below the line.
In no particular order, here’s a list of links mentioned that I still need to dive into properly, and a few books I’ve added to my reading list.
- Continuous Testing in DevOps
- Know Thy Enemy - How to prioritise and communicate risk
- Report from the SNAFUcatchers Workshop on Coping With Complexity
- Catching the Apache SNAFU
- Ironies of Automation
- Rasmussen and practical drift
- Moving Past Shallow Incident Data
- The DevOps Dictionary: CAMS
- How Complex Systems Fail
Daniel H. Pink
Most people believe that the best way to motivate is with rewards like money—the carrot-and-stick approach. That's a mistake, says Daniel H. Pink. In this provocative and persuasive new book, he asserts that the secret to high performance and satisfaction-at work, at school, and at home—is the deeply human need to direct our own lives, to learn and create new things, and to do better by ourselves and our world.
The Field Guide to Understanding 'Human Error'
The Field Guide to Understanding ’Human Error’ will help you understand a new way of dealing with a perceived 'human error' problem in your organization. It will help you trace how your organization juggles inherent trade-offs between safety and other pressures and expectations, suggesting that you are not the custodian of an already safe system.
Testing in Devops
As organisations shift to a culture of intense collaboration and rapid delivery, the expectations on testers are changing. What does testing look like in an environment with automated build and deployment pipelines? How does appetite for risk change once a product can be tested in production? Who should testers look to connect with across the organisation and how can they work together effectively to deliver quality software?