Unexpected Lessons Learned From the CrowdStrike Event

How your organization can leverage the disruptive CrowdStrike update to become more resilient.

Chip Stewart, Director–Security, Privacy, and Risk Consulting, RSM US

July 25, 2024

5 Min Read
The word RESLIENCE in white on a green arrow pointing to the right
Source: Illia Uriadnikov via Alamy Stock Photo

COMMENTARY

In the wake of global IT issues caused by a defect in a content update for CrowdStrike's Falcon sensor, many organizations engaged in executing business continuity plans (BCPs), recovering systems, and restoring from backups. In the throes of these activities, it's easy to overlook the similarity with the playbook for ransomware recovery and miss how organizations of all sizes can leverage this event to identify gaps in their capabilities to respond to and recover from ransomware or other disruptive cyberattacks.

It's important to acknowledge the scale of this disruption, with 8.5 million incapacitated PCs across all sectors and organizational sizes. Small businesses, global conglomerates, government agencies, hospitals, and critical infrastructure were all met with the dreaded blue screen of death.

Even those who were not CrowdStrike customers felt the impact, with flights delayed and canceled, gas stations and grocery stores unable to complete transactions, and critical services like police and fire dispatch delayed.

However, there are lessons that every organization can take away from this event to help improve their ability to respond to a cyberattack.

Detection

The mean time to detect, or MTTD, is a metric in cyber operations describing how long it takes between when an incident begins and when the organization identifies that something has happened. This metric has been trending down for the past several years, partially because incidents resulting in ransomware deployment are glaringly and quickly apparent. This speedy detection contrasts with incidents where a sophisticated threat actor is siphoning private data from a network, which typically takes longer to discover.

This behavior parallels the obviousness of the CrowdStrike event. Computers across the network became unavailable, displaying a cryptic message about a failed driver. With this event, we have clarity on the start time, with organizations experiencing blue screens around 04:09 UTC.

Organizations should evaluate how long their teams took to detect the outage and how quickly they could reasonably speculate on the root cause. These metrics matter when threat actors deploy ransomware on a network.

Response

As IT teams confirmed the cause of the system issues, organizations scrambled to begin restoring systems. Many organizations struggled with incomplete asset inventories, partially managed devices, and no way to prioritize recovery activities reliably. Some found themselves locked out of the password vaults needed to restore critical systems. Others struggled to scale quickly to reimage laptops of remote users scattered across the country. 

These challenges mirror those expected during a ransomware incident and highlight the importance of maintaining an accurate accounting of IT assets that informs prioritization during recovery. Also, recovery plans must consider the operating environment and support reconstituting services in time frames that align with business objectives.

Organizations should evaluate the effectiveness of their response plans during this event, including their ability to prioritize systems that support critical functions and develop or test the granular recovery plans necessary to expedite the reconstitution of these services. They should also determine where there were gaps in asset management and the underlying causes of those discrepancies.

Business Continuity

As IT teams worked around the clock to restore systems and get users back online, this event forced many organizations to execute their business continuity plans and restore mission-critical functions. Organizations frequently confuse BCPs with disaster recovery plans (DRPs), resulting in an inability to execute mission-critical functions during this event.

Organizations frequently experience challenges in this area during a ransomware event, with no plan to reconstitute the capabilities that support mission-critical functions. With several high-profile ransomware attacks affecting health departments across the United States (including one during my tenure as the chief information security officer for the State of Maryland), seemingly simple administrative tasks, such as issuing death certificates, become impossible.

To prepare for ransomware events or other cyber disruptions, organizations should conduct a business impact analysis (BIA) and integrate the outputs into comprehensive BCPs, reducing the risk of protracted business disruption from a ransomware incident.

Supply Chain and Vendor Risk

The scale of this event has highlighted the risks of cyber events affecting supply chains more than any event in recent history, with financial transactions stalled, logistics companies unable to deliver goods, and hospitals unable to replenish lifesaving supplies. 

In 2021, Kronos, a human capital and workforce management SaaS provider, experienced substantial downtime due to a ransomware event, preventing employees from being paid and stopping work activities at thousands of organizations around the globe. With many organizations relying on their partners to recover quickly from a cyber incident, few were prepared to sustain operations in the event of an incident affecting their supply chain.

Organizations should consider and plan for cyber incidents that have negative consequences on their supply chains as part of their business continuity plans and ensure their partners do the same. 

Improving Resilience From This Event

For organizations affected by the CrowdStrike disruption, there is a unique opportunity to reflect on what went well and what your organization should focus on improving through the lens of a ransomware incident. While the disruption certainly should not be minimized, the reality of a ransomware incident includes exorbitant costs, weeks to months of downtime, regulatory challenges, potential lawsuits, and numerous other adverse effects that represent long-term organizational damage.

For organizations that experienced indirect impact through partners in their supply chain, there is an opportunity to ensure adequate supply chain diversification and contingency planning is happening.

For organizations lucky enough to have not experienced any adverse impact from this event, it's crucial to recognize that this could have happened to your organization just as easily, and it's better to be prepared than to be lucky.

For everyone, this is an opportunity to reflect on what you should be doing to improve your organization's resilience.

About the Author

Chip Stewart

Director–Security, Privacy, and Risk Consulting, RSM US

Chip Stewart is a director in RSM's Security, Privacy, and Risk Consulting practice and the former CISO for the state of Maryland, where his charge was to build a statewide security program from the ground up. In this role, he championed the creation of the MD-ISAC, an advanced cyber-threat intelligence center focused on helping all levels of government in Maryland proactively protect their networks from cyber threats by providing them with real-time information.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights