What do you do when a software system goes wrong? Chris Oldwood discusses designing for supportability.
One of the books which had a profound impact on me early on in my programming career was Writing Solid Code by Steve Maguire. In Chapter 4 (Step Through Your Code) he introduces the practice of stepping through any new code you write, in the debugger, to see the code in action so that you can check the data flow, such as loop variables, to help avoid the perennial programmer nemesis – the off-by-one error. One of the side-effects of this practice is that it forces you to think about how to make it easy to get to that point in code in a debugger. If the code is many layers deep in the application then you might be tempted to create an explicit test harness that allows you to invoke the code more easily, along with the added benefit of giving you more control over the inputs. In turn, that thought process can have an effect on the design of the code as you make it more ‘debuggable’ in the first place.
Although he didn’t use the term in his book back in 1993, this notion of shaping the code to make it easier to test is now known as ‘Design for Testability’ and has a history in the hardware world that dates back to at least the early half of the 20th century. Black Box Testing, while useful, can only get you so far in the hardware world and, as complexity grew, they started to add additional features to help ensure the product was working correctly internally. In the software world, White Box Testing has materialized under the guise of Unit Testing, with Mocking in particular being a realization of how the desire to make code more testable can affect the design of components.
I continued to use the practice of stepping through my code in the debugger as my primary means of testing for the better part of a decade. What brought it to an end was being introduced to the newfangled practice of automated unit testing, along with the realization that the computer was so much more reliable at repetitive tasks like regression testing than a human. (More details on my eventual fall from grace and subsequent epiphany can be found in my ACCU 2017 conference talk ‘A Test of Strength’.)
Being able to easily and reliably test my code was definitely a big win, but it also had another side-effect that I hadn’t anticipated until I started working on more complex systems – supportability. I got my first glimpse of this when I discovered that a test harness I wrote to make development of a back-end scheduling engine easier was being bundled with the application, for when bugs in the front-end made it impossible to fix-up the schedule. My test harness, while very raw from a GUI point of view (the sea of database IDs felt a bit like staring at The Matrix) allowed direct access to the back-end code so the schedule could be fixed-up by manually driving it using the real business logic. This was considered far safer than hacking about directly in the database as it minimised the chances of corrupting the state. (Debugging through the front-end, the default practice up to that point, cost you 8 minutes just waiting for it to load before you could invoke any back-end logic.)
That experience taught me that there was more value in test harnesses than simply being able to make a developer’s life easier. As I started to interact with more support engineers, I began to see how hard their life was supporting applications and systems because they were so far removed from the developers building the system. In the intervening three decades since that episode took place, the industry as a whole has started to empathise more with those outside the development team and have recognised that other areas such as InfoSec and Ops are also valid stakeholders in the system and their needs have to be listened to and addressed alongside those from the end users. This culminated in the creation of the DevOps movement and a ‘you build it, you run it’ mentality, although it has since grown so much wider as the realisation dawned that only a holistic approach to building and running systems works in practice over the long term.
While perhaps somewhat easier now, in the past I have had to fight for my belief in what appears to be only informally known today as Design for Supportability. One project manager back in the late 2000s even suggested that any time spent creating custom tooling should be my own time, as it was not part of The Deliverables. When the ‘Business as usual’ (BAU) and Analysis teams discovered a testing tool I wrote to help me create custom test data sets, they openly thanked me, and then I felt my approach and time was vindicated.
When I moved to another organization in the same industry to work on a similar system, I put supportability front and centre, letting it drive the design and architecture to such an extent that for production it ran as a bunch of distributed services, but the same code could also be hosted in a single command line tool using local instead of remote procedure calls. I called it a ‘gig-in-a-box’ because the entire distributed calculation engine was essentially running as a monolithic process which allowed us to easily debug, test, profile, and hence support the majority of the system’s codebase. We even had a formal database schema called ‘support’ so our ad hoc SQL snippets could become first class citizens.
For sure, wasting time on speculative requirements and gold-plating are a concern, but there are ways to make that visible and, more importantly, discover what is driving that behaviour. Any team probably already has a bunch of half-baked, stale, duplicated support scripts and tools, so formalising them by adding them to the codebase can only be a good thing as then they will get the care and attention they deserve. Production incidents are stressful enough as it is, having a good toolkit can reduce the chances of that turning into a full-on disaster.
plush corporate offices the comfort of his breakfast bar. He has resumed commentating on the Godmanchester duck race but continues to be easily distracted by emails and DMs.
is a freelance programmer who started out as a bedroom coder in the 80s writing assembler on 8-bit micros. These days it’s enterprise grade technology from