Skip to main content
Best News Website or Mobile Service
WAN-IFRA Digital Media Awards Worldwide 2022
Best News Website or Mobile Service
Digital Media Awards Worldwide 2022
Hamburger Menu
Advertisement
Advertisement

Podcasts

Deep Dive Podcast: Global IT outage - Are we relying on too few big tech companies?

A more proactive approach to managing tech failures is critical because it is impossible to prevent them.

Deep Dive Podcast: Global IT outage - Are we relying on too few big tech companies?

CNA's weekly news podcast takes a deep dive into issues that people talk about at dining tables and along the office corridors. Hosted by Steven Chia and Crispina Robert.

One bad software update by cybersecurity firm CrowdStrike unleashed global chaos last week with flights cancelled, hospital systems down and banking applications going offline. How did a routine update become a full-blown crisis, and will we see more in the future?

On the Deep Dive podcast, Steven Chia and Crispina Robert put these questions to Gaurav Keerthi, head of advisory and emerging business at Ensign InfoSecurity and Benjamin Ang, head of Digital Impact Research at the S Rajaratnam School of International Studies.

United Airlines employees wait by a departures monitor displaying a blue error screen, also known as the “Blue Screen of Death” inside Terminal C in Newark International Airport, after United Airlines and other airlines grounded flights due to a worldwide tech outage caused by an update to CrowdStrike's "Falcon Sensor" software which crashed Microsoft Windows systems, in Newark, New Jersey, U.S., July 19, 2024. REUTERS/Bing Guan

Here's an excerpt from the podcast: 

Steven Chia:
From (a) technical point of view, did this also happen because there weren't enough checks in place? (Don’t) we have to ensure that the system is working well, or whatever goes into the system is done in a way that does not allow things like that to happen? Was something missing in this case?

Benjamin Ang:
There are a lot of theories that the quality assurance wasn't tight enough for this particular (outage). But also, in perspective, there are thousands of these (updates) going out every day, every week, every month. And so far, 99.9 per cent of (the) time it doesn't cause outages.

Steven:
But is that acceptable?

Benjamin:
I think we need to think about (how) it's going to happen again. It's just by the sheer number and the sheer connectivity we have. We are going to need to have that resilience. I know people make fun of the handwritten boarding pass but that's very resilient, that they actually can do it.

Crispina Robert:
Ben, on that point you were making about the written boarding pass, one of the big questions our audience asked was, why isn't there some sort of proper backup for airlines, for example? Why is it that, only when something like this happens, then everybody's scrambling to figure out how to we get people on the plane? There were people who missed their flights too, right? Should high-key businesses really double down on an alternative?

Benjamin:
It would be good to have things like resilience drills, like we have fire drills, emergency drills, so that we practise it. But also realise that we are functioning in a world that is way faster than human beings are able to.

Crispina:
Yes. How fast can you write out a boarding pass right?

Gaurav Keerthi:

I know we make fun of the manual boarding passes. I'm not a big fan of the manual workaround either. There could be digital systems in place to take over, but you also must accept that these take time to kick in.

This is what we call a failover process. If your primary system fails, what's your backup plan? And sometimes it takes maybe half an hour, an hour, two hours, to set up. If your car breaks down on the way to the office, you can still get to the office, but you will be late. Maybe it's 10 minutes late, maybe it's half an hour late, whether you take a taxi or a bus, but you will get in.

And so I think in the airport's case, it did take a little bit of time. It was one to two hours, but very quickly things got back (online). And there was a transition period where some people unfortunately missed their flight. But I think if you step back and look at the wider ecosystem, the lights stayed on. The phone still worked, hospitals were still functioning.

Find more episodes of Deep Dive here.

A new episode of Deep Dive drops every Friday. Follow the podcast on Apple or Spotify for the latest updates.

Have a great topic for us? Drop the team an email at cnapodcasts [at] mediacorp.com.sg  

Source: CNA/jj
Advertisement

Also worth reading

Advertisement