Incident management

Leave a comment May 20, 2020 Thom Langford

Back in the ’90s, there was a game released called Command and Conquer, a strategic game whereby you had to manage resources, build, train and mobilise armies and conquer the neighbouring armies. It was a classic that spawned many spin-offs, sequels and addons for decades. What struck me about it though was how multi-skilled you had to be, especially in the later levels.

You couldn’t just be an excellent Field Marshall as you also had to manage resources, cash and other materials to create your buildings and structures that allowed you to create your army in the first place. You had to know logistics, how long something would take to build, train and mobilise, look into the future at new locations for better access to materials, and also have plans in place if the enemy attacked before you were ready.

Essentially, you were skipping from one crisis to the next, finely balancing between success and crashing failure. It sounds a lot like any modern-day incident management situation really.

In this week’s The Lost CISO (season 2), I take a quick look at incident management and highlight four key points to remember during an incident. In case you haven’t seen it yet. here it:

The bottom line is that, much like in the Command & Conquer game, you could plan ahead what you were doing because the environment was constantly changing, the unknowns were stubbornly remaining unknowns and the literal (in the case of the game) fog of war meant you can’t see more than just a few steps ahead. There are though some keys to success.

The first key point is that having a plan is all well and good, but as my military friend regularly tell me;

no plan survives contact with the enemy

Why? Because the enemy much like life does random, unexpected and painful things on a regular basis. Incidents have a habit of doing the same thing, so if your plan is rigid, overly explicit and has little room to ad-lib or manoeuvre in, it will fail.

Therefore, my approach has always been to build any kind of plan around four simple areas:

Command
Control
Communication
Collaboration

In other words, decide who is in charge, decide who is responsible for what areas, ensure everyone knows how to talk to each other, ensure everyone works openly and honestly with everyone else. There may be some other details in there as well, but really, if you have these four areas covered your plans will remain flexible and effective, and you may find yourself being able to close incidents more quickly and efficiently.

With all that extra time on your hands, you can then spend some time basking under the Tiberian sun.

Leave a comment August 18, 2015 Thom Langford

Most accidents originate in actions committed by reasonable, rational individuals who were acting to achieve an assigned task in what they perceived to be a responsible and professional manner.

(Peter Harle, Director of Accident Prevention,Transportation Safety Board of Canada and former RCAF pilot, ‘Investigation of human factors: The link to accident prevention.’ In Johnston, N., McDonald, N., & Fuller, R. (Eds.), Aviation Psychology in Practice, 1994)

I don’t just read infosec blogs or cartoons that vaguely related to infosec, I also read other blogs from “normal” people. One such blog is from a chap called Wayne Hale who was a Fligh Director (amongst other things) at NASA until fairly recently. As a career NASA’ite he saw NASA from it’s glory days through the doldrums and back to the force it is today. There are a number of reasons I like his blog, but mostly I have loved the idea of space since I was a little kid – I still remember the first space shuttle touching down, watching it on telly, and whooping with joy much to my mother’s consternation and chagrin. The whole space race has captured my imaginaion, as a small child and an overweight adult. I encourage anyone to head to his blog for not only fascinating insider stories of NASA, but also of the engineering behind space flight.

What Wayne’s blog frequently shows is one thing; space is hard. It is an unforgiving environment that will take advantage of every weakness, known and unknown, to take advantage and destroy you. Even just getting into space is hard. Here is Wayne describing a particular incident the Russians had;

The Russians had a spectacular failure of a Proton rocket a while back – check out the video on YouTube of a huge rocket lifting off and immediately flipping upside down to rush straight into the ground. The ‘root cause’ was announced that some poor technician had installed the guidance gyro upside down. Reportedly the tech was fired. I wonder if they still send people to the gulag over things like that.

This seems like such a stupid mistake to make, and one that is easy to diagnose; the gyro was in stalled upside down by an idiot engineer. Fire the engineer, problem solved. But this barely touches the surface of root cuse analysis. Wayne coniTunes;

better ask why did the tech install the gyro upside down? Were the blueprints wrong? Did the gyro box come from the manufacturer with the ‘this side up’ decal in the wrong spot? Then ask – why were the prints wrong, or why was the decal in the wrong place. If you want to fix the problem you have to dig deeper. And a real root cause is always a human, procedural, cultural, issue. Never ever hardware.

What is really spooky here is that the latter part of the above quote could so easily apply to our industry, especially the last sentence – it’s never the hardware.

A security breach could be traced back to piece of poor coding in an application;

1. The developer coded it incorrectly. Fire the developer? or…

2. Ascertain that the Developer had never had secure coding training. and…

3. The project was delivered on tight timelines and with no margins, and…

4. As a result the developers were working 80-100 hrs a week for three months, which…

5. Resulted in errors being introduced into the code, and…

6. The errors were not found because timelines dictated no vulnerabiliy assessments were carried out, but…

7. A cursory port scan of the appliction by unqualified staff didn’t highlight any issues.

It’s a clumsy exampe I know, but there are clearly a number of points (funnily enough, seven) throughout the liufecycle of the environment that would have highlighted the possibility for vulnerabilities, all of which should have been acknowledged as risks, assessed and decisions made accordingly. Some of these may fall out of the direct bailiwick of the information security group, for instance working hours, but the impact is clearl felt with a security breach.

A true root cause analysis should always go beyond just the first response of “what happened”? If in doubt, just recall the eponymous words of Bronski Beat;

“Tell me why? Tell me why? Tell me why? Tell me why….?”