Taking Steps to Strengthen Digital Defenses
The recent worldwide tech outage caused by a CrowdStrike software update, which affected approximately 8.5 million Windows machines, has sent ripples through the tech industry. This incident serves as a stark reminder of the vulnerabilities inherent in our increasingly digital world and the need for robust safeguards.
Scrutinizing Vendor Practices and Software Updates
Neil MacDonald, a Gartner vice president, emphasizes that IT leaders should hold vendors deeply integrated within IT systems to a "very high standard" of development, release quality, and assurance. This is particularly crucial for security vendors like CrowdStrike.
"Any security vendor has a responsibility to do extensive regression testing on all versions of Windows before an update is rolled out," MacDonald states. This incident underscores the importance of implementing rigorous vetting processes for IT vendors, including detailed inquiries about their software development practices, testing procedures, and release protocols.
Amy Farrow, chief information officer of IT automation and security company Infoblox, reinforces this approach: "Incidents like this remind all of us in the CIO community of the importance of ensuring availability, reliability and security by prioritizing guardrails such as deployment and testing procedures and practices."
Rethinking Automatic Updates
While automatically accepting software updates has become the norm and is often recommended for security reasons, the CrowdStrike incident has led many to reconsider this approach. Paul Davis, a field chief information security officer at software development platform maker JFrog, suggests that thorough testing of packages, upgrades, and new features should still be a priority, especially for updates with potentially high impact.
Davis recommends implementing a staged rollout process for critical systems, starting with thorough testing in a controlled environment before any company-wide deployment.
Leveraging AI for Enhanced Security
To address the limitations of human oversight in complex IT systems, exploring the use of artificial intelligence in security protocols is becoming increasingly important. Jack Hidary, chief executive of AI and quantum company SandboxAQ, points out that "humans are not very good at catching errors in thousands of lines of code. We need AI trained to look for the interdependence of new software updates with the existing stack of software."
Disaster Recovery and Business Continuity
Gartner's MacDonald likens an incident that renders computers unusable to a natural disaster, emphasizing the need for robust disaster recovery plans. Chirag Mehta, a cybersecurity analyst at Constellation Research, recommends setting up a "clean room," or an environment isolated from other systems, to bring critical systems back online during an outage.
Victor Zyamzin, chief business officer of security company Qrator Labs, underscores the importance of regular data backups: "Another suggestion for companies, and we've been saying that again and again for decades, is that you should have some backup procedure applied, running and regularly tested."
Contract Reviews and Insurance Coverage
Reviewing vendor contracts and insurance coverage is another crucial step. MacDonald suggests looking for clauses indicating that vendors must provide reliable and stable software. "That's where you may have an advantage to say, if an update causes an outage, is there a clause in the contract that would cover that?" he notes.
Peter Halprin, a partner with law firm Haynes Boone focused on cyber insurance, highlights the importance of insurance in providing companies with bottom-line protection against cyber risks, including business income losses associated with outages.
Diversifying IT Infrastructure
The CrowdStrike incident, which only affected Windows-based systems, has prompted many organizations to reconsider their reliance on a single operating system. Mehta points out that Apple's Mac operating system and Linux don't allow the same level of kernel access as Windows, potentially reducing vulnerability to certain types of outages.
Evaluating the possibility of incorporating a mix of operating systems in IT infrastructure, including exploring alternatives like Chromebooks for certain roles, could be beneficial. As Mehta notes, "Not all of them require deeper access to things. What are you doing on your laptop that actually requires Windows?"
The CrowdStrike outage serves as a wake-up call for the entire tech industry. It highlights the need for continuous improvement in IT practices, from vendor management and software update protocols to disaster recovery planning and infrastructure diversity. As technology continues to evolve, so too must our approaches to ensuring its reliability and security. By learning from incidents like this and implementing robust safeguards, organizations can build more resilient digital ecosystems capable of withstanding future challenges.
Comments