What 80’s pop can teach us about Rocket failure and incident management

image

Most accidents originate in actions committed by reasonable, rational individuals who were acting to achieve an assigned task in what they perceived to be a responsible and professional manner.

(Peter Harle, Director of Accident Prevention,Transportation Safety Board of Canada and former RCAF pilot, ‘Investigation of human factors: The link to accident prevention.’ In Johnston, N., McDonald, N., & Fuller, R. (Eds.), Aviation Psychology in Practice, 1994)

I don’t just read infosec blogs or cartoons that vaguely related to infosec, I also read other blogs from “normal” people. One such blog is from a chap called Wayne Hale who was a Fligh Director (amongst other things) at NASA until fairly recently. As a career NASA’ite he saw NASA from it’s glory days through the doldrums and back to the force it is today. There are a number of reasons I like his blog, but mostly I have loved the idea of space since I was a little kid – I still remember the first space shuttle touching down, watching it on telly, and whooping with joy much to my mother’s consternation and chagrin. The whole space race has captured my imaginaion, as a small child and an overweight adult. I encourage anyone to head to his blog for not only fascinating insider stories of NASA, but also of the engineering behind space flight.

What Wayne’s blog frequently shows is one thing; space is hard. It is an unforgiving environment that will take advantage of every weakness, known and unknown, to take advantage and destroy you. Even just getting into space is hard. Here is Wayne describing a particular incident the Russians had;

The Russians had a spectacular failure of a Proton rocket a while back – check out the video on YouTube of a huge rocket lifting off and immediately flipping upside down to rush straight into the ground. The ‘root cause’ was announced that some poor technician had installed the guidance gyro upside down. Reportedly the tech was fired. I wonder if they still send people to the gulag over things like that.

This seems like such a stupid mistake to make, and one that is easy to diagnose; the gyro was in stalled upside down by an idiot engineer. Fire the engineer, problem solved. But this barely touches the surface of root cuse analysis. Wayne coniTunes;

better ask why did the tech install the gyro upside down? Were the blueprints wrong? Did the gyro box come from the manufacturer with the ‘this side up’ decal in the wrong spot? Then ask – why were the prints wrong, or why was the decal in the wrong place. If you want to fix the problem you have to dig deeper. And a real root cause is always a human, procedural, cultural, issue. Never ever hardware.

What is really spooky here is that the latter part of the above quote could so easily apply to our industry, especially the last sentence – it’s never the hardware.

A security breach could be traced back to piece of poor coding in an application;

1. The developer coded it incorrectly. Fire the developer? or…

2. Ascertain that the Developer had never had secure coding training. and…

3. The project was delivered on tight timelines and with no margins, and…

4. As a result the developers were working 80-100 hrs a week for three months, which…

5. Resulted in errors being introduced into the code, and…

6. The errors were not found because timelines dictated no vulnerabiliy assessments were carried out, but…

7. A cursory port scan of the appliction by unqualified staff didn’t highlight any issues.

It’s a clumsy exampe I know, but there are clearly a number of points (funnily enough, seven) throughout the liufecycle of the environment that would have highlighted the possibility for vulnerabilities, all of which should have been acknowledged as risks, assessed and decisions made accordingly. Some of these may fall out of the direct bailiwick of the information security group, for instance working hours, but the impact is clearl felt with a security breach.

A true root cause analysis should always go beyond just the first response of “what happened”? If in doubt, just recall the eponymous words of Bronski Beat;

“Tell me why? Tell me why? Tell me why? Tell me why….?”

Tags: , , , , , , ,

About Thom Langford

An information security professional, award winning security blogger and industry commentator. Available as a speaking head and presenter on topics relating to information security, risk management and compliance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: