Global disruptions: What was the IT outage?
An IT outage is when tech services or systems go down and can't be used due to hardware or software issues, and in more serious instances – cyber attacks.
The recent IT outage on 19 July caused disruptions across various industries and companies worldwide. Airlines grounded flights, financial transactions came to a halt, and media companies faced interrupted broadcasts.
The problem is reported to have been caused by a technical glitch with an update launched on Microsoft Windows software and 365 apps. The US cybersecurity company CrowdStrike took responsibility, explaining that it was due to a “defect found in a single content update” – not a cyberattack or security incident. A working fix was rolled out a few hours later.[¹]
Ripple effect: Who and what has been affected by the IT outage?
The recent major internet outage had a wide-ranging effect, disrupting services across various sectors around the world. The extent of the impact is due to companies and institutions' heavy reliance on software, cloud services, and internet connectivity to manage their operations. When these systems face disruption, the ripple effect can be felt far and wide, affecting:
- Banks: Online banking services, ATMs, and transaction processing systems came to a standstill. Customers faced significant inconvenience, and financial transactions were delayed.
- Supermarkets: Point of Sale (POS) systems, inventory management, and online grocery delivery services were all disrupted. This led to long lines, stock shortages, and delivery delays.
- Hospitals: Critical healthcare services were affected. Electronic health records (EHR), appointment scheduling systems, and diagnostic tools that rely on the internet were rendered inoperable, potentially jeopardising patient care.
- Pharmacies: Prescription management systems, online ordering, and inventory tracking systems were disrupted. This caused delays in dispensing medication and reduced access to pharmaceuticals.
- Stock exchange: Trading platforms experienced downtime, halting stock trading activities and leading to financial losses and market instability.
- Schools: Online learning platforms, administrative software, and digital resources became inaccessible. This disrupted education and communication between teachers, students, and parents.
- Ferries: Ticket booking systems, scheduling software, and communication networks were impacted. This caused delays and cancellations in ferry services, affecting both passengers and cargo transport.
The network outage underscored just how vulnerable and delicate various essential services and industries are to IT disruptions. Many businesses faced issues, also reporting card payments down, severely affecting their operations.
This incident showed the crucial importance of reliable technology infrastructure. In our interconnected world, dependable IT systems are a necessity for smooth business operations, keeping customers happy, and protecting revenue.
Without strong and resilient technology, businesses risk major operational and financial problems. This stresses the importance of reliable and secure IT solutions.
Regarding the global IT outage that caused widespread disruption last week, Dojo’s Vice President of Engineering, Robert Howes, commented:
"The issue in this case, while not directly caused by the Windows operating system, is related to a widely used security layer that by design runs in the most secure part of the operating system. It’s from here that an error can, and unfortunately in this case did, cause a full system shutdown.”
When asked why the issue became so far-reaching, Howes added:
“While businesses may have a varying dependency on the IT systems that they use to run their day-to-day operations, these systems still have a common underlying need to run on top of a foundational operating system – the same system that includes the aforementioned security layer. CrowdStrike, who provide this security software, is estimated to be used in over half of Fortune 100 companies[2]. So with many security-conscious businesses investing in this software to boost their resilience to security threats, it’s no surprise that the resulting issue became extensive.”
“The prevalence of Windows machines, combined with the popularity of CrowdStrike for threat prevention, resulted in an estimate of over 8.5m machines being impacted – on which all manner of business infrastructure is globally run.”
Behind the breakdown: What are the most common causes of IT outages?
While IT, or internet, outages can happen for many reasons, the most common causes include:
- Hardware issues: Hardware refers to the physical components of a computer or network, such as servers, routers, and cables. New hardware installations or ageing components reaching their end of life. Other frequent problems include router failures, cable cuts, and power outages.
- Software issues: Software includes the programs and operating systems that run on hardware, servers and routers. These involve bugs, glitches, corrupted files, misconfigurations, and compatibility problems. Using outdated software or hardware also raises the risk of application failure and system outages.
- Human error: Unplanned downtime is frequently caused by accidental mistakes or negligence where a person is involved.
According to Howes the “Two big causes for outages will be security-related incidents and those triggered as a result of pushing software or infrastructure changes.”
When asked what businesses can do to mitigate the risk of disruption during outages or times of network downtime Robert Howes, Dojo’s VP of Engineering, outlined three core areas:
1. Investment in identifying potential single points of failure and upfront design to consider how to eliminate these with redundancy measures.
“It is important to consider all third party dependencies in this analysis as this is often focused on internally built components with a bias towards over trusting third parties – particularly where these are reputable and highly established.”
2. Robust change management processes are also key here where strong governance can minimise the likelihood and blast radius of an incident.
“As with above, there is often a bias towards internally built systems and it is important to properly scope change management with third party updates – particularly if they relate to lower level foundational infrastructure, as we saw with the most recent outage.”
3. Investing in comprehensive and established incident management runbooks.
“While we want to mitigate the likelihood of an incident, how prepared a business is to respond quickly and comprehensively can greatly reduce the impact of an incident when it does arise.”
Standing strong: How Dojo compares
While many businesses struggled during the recent global IT outage, Dojo’s card machines remained resilient and reliable. Our system's robust architecture ensured uninterrupted payment services for all our customers, from cafes and bars to shops and beyond. This reliability is thanks to several key factors:
- Advanced technology: Dojo's infrastructure is built on cutting-edge technology, providing an industry-leading uptime of 99.99% with dual Wi-Fi and 4G connectivity.
- Proactive monitoring: Our team continually and actively monitor systems to identify and address issues before they impact our customers.
- Transparent communication: We keep our customers in the loop with regular updates and clear status pages at Dojo Status.
Unlike other systems that suffered from down card payments for extended periods during this outage, Dojo’s payment products ensured seamless transactions. To better understand why, Robert Howes, VP of Engineering at Dojo, shed light on the robust strategies and principles that meant Dojo’s customers could continue to take payments throughout the disruption:
“Although July’s IT outage only affected businesses that used the specific software on Windows machines, Dojo always looks to take measures that mitigate against any potential disruption for the businesses that we provide payments for.
This is a testament to the culture fostered by our teams on being intentional about eliminating single points of failure and investing in a highly redundant architecture, alongside a considered rollout of changes across our technology real estate, that prevented this from developing into a service outage.”
When asked which measures Dojo has taken to achieve this, Howes commented:
“Although there is always a chance of an incident, we consistently review our strategies and infrastructure to mitigate the impact of any that do occur and ensure they’re resolved swiftly. From multi-carrier 4G SIMs in our terminals to embracing a multi-cloud strategy, our infrastructure has been built from the ground up with resilience in mind and looking at redundancy wherever possible.
On top of our investment in robust quality engineering principles and test coverage, we leverage symptom-based monitoring, phased rollouts of our changes and comprehensive on-call rotas to mitigate and contain the scope of any incident.
This mindset is baked into the Dojo DNA and very much born from our belief that it is a privilege to run payments for our 150k+ UK SME, independent, and enterprise businesses, and recognition that interruption to our service can cause significant disruption and even threaten livelihoods.
As part of this, our Technology team partners with Risk, Compliance, and Security as we believe that collaboration across these lenses leads us to the strongest resilience posture.”
Promise of performance: Dojo’s commitment to you
In the middle of global network failures, Dojo continued to process payments and bookings for small and medium-sized businesses as well as enterprises without interruption. Our 150,000+ business customers remained unaffected by the global outage, thanks to our powerful technology and transparent communication.
We prioritise both exceptional customer service and reliable, robust payment solutions that offer the peace of mind merchants need in times of instability.
Achieving an industry-leading uptime of 99.99% with dual Wi-Fi and 4G connectivity, our customers can keep accepting card payments and know that Dojo will continue as normal, even when the unexpected happens.
To learn more about how Dojo can support your business, reach out or book a demo. While you’re at it, check out our blog for insights on how to streamline and grow your business, as you see our impact in action.