Deep Dive Podcast: Global IT outage - Are we relying on too few big tech companies?
A more proactive approach to managing tech failures is critical because it is impossible to prevent them.
.jpg?itok=YGopmaUR)
CNA's weekly news podcast takes a deep dive into issues that people talk about at dining tables and along the office corridors. Hosted by Steven Chia and Crispina Robert.
One bad software update by cybersecurity firm CrowdStrike unleashed global chaos last week with flights cancelled, hospital systems down and banking applications going offline. How did a routine update become a full-blown crisis, and will we see more in the future?
On the Deep Dive podcast, Steven Chia and Crispina Robert put these questions to Gaurav Keerthi, head of advisory and emerging business at Ensign InfoSecurity and Benjamin Ang, head of Digital Impact Research at the S Rajaratnam School of International Studies.

Here's an excerpt from the podcast:Â
Steven Chia:
From (a) technical point of view, did this also happen because there weren't enough checks in place? (Don’t) we have to ensure that the system is working well, or whatever goes into the system is done in a way that does not allow things like that to happen? Was something missing in this case?
Benjamin Ang:
There are a lot of theories that the quality assurance wasn't tight enough for this particular (outage). But also, in perspective, there are thousands of these (updates) going out every day, every week, every month. And so far, 99.9 per cent of (the) time it doesn't cause outages.
Steven:
But is that acceptable?
Benjamin:
I think we need to think about (how) it's going to happen again. It's just by the sheer number and the sheer connectivity we have. We are going to need to have that resilience. I know people make fun of the handwritten boarding pass but that's very resilient, that they actually can do it.
Crispina Robert:
Ben, on that point you were making about the written boarding pass, one of the big questions our audience asked was, why isn't there some sort of proper backup for airlines, for example? Why is it that, only when something like this happens, then everybody's scrambling to figure out how to we get people on the plane? There were people who missed their flights too, right? Should high-key businesses really double down on an alternative?
Benjamin:
It would be good to have things like resilience drills, like we have fire drills, emergency drills, so that we practise it. But also realise that we are functioning in a world that is way faster than human beings are able to.
Crispina:
Yes. How fast can you write out a boarding pass right?
Gaurav Keerthi:
I know we make fun of the manual boarding passes. I'm not a big fan of the manual workaround either. There could be digital systems in place to take over, but you also must accept that these take time to kick in.
This is what we call a failover process. If your primary system fails, what's your backup plan? And sometimes it takes maybe half an hour, an hour, two hours, to set up. If your car breaks down on the way to the office, you can still get to the office, but you will be late. Maybe it's 10 minutes late, maybe it's half an hour late, whether you take a taxi or a bus, but you will get in.
And so I think in the airport's case, it did take a little bit of time. It was one to two hours, but very quickly things got back (online). And there was a transition period where some people unfortunately missed their flight. But I think if you step back and look at the wider ecosystem, the lights stayed on. The phone still worked, hospitals were still functioning.