AI Vulnerability Discovery and the Case for Systems Security Engineering

Dr. Darren Death
Apr 16
9 min read

April 2026

Author: Dr. Darren Death, ICIT Fellow

For decades, the approach to building technology has operated on an implicit assumption that security could be addressed after the fact. Organizations built systems to meet functional requirements, shipped them when they worked, and addressed security through periodic assessments, patching, and monitoring. The economics of the threat environment supported this approach. Vulnerability discovery was expensive, required specialized expertise, and moved slowly enough that most organizations could absorb the associated risk through standard remediation and governance cycles. The probability that any given flaw would be discovered and weaponized before the next patch cycle remained low enough that the cost of engineering security into a system from its inception could not be justified against the cost of addressing issues as they were found. This created an environment in which building technology that simply worked was sufficient, and the cost of building technology that was engineered to be secure was treated as an unnecessary burden on development timelines and budgets.

The capabilities demonstrated by recent advances in artificial intelligence have fundamentally altered this assessment.

The Changing Threat Landscape

Earlier this month, Anthropic disclosed that its Mythos system autonomously discovered thousands of previously unknown vulnerabilities across every major operating system and web browser, generated working exploits without human direction, and achieved a 72% exploit success rate. The system identified vulnerabilities that had persisted through decades of manual code review and automated testing, including one in OpenBSD that had been present and exploitable for 27 years.

The change is not limited to one vendor or one model. Anthropic reported that Claude Opus 4.6 found 22 Firefox vulnerabilities in February 2026. Anthropic also stated that Mythos Preview can identify and exploit zero-day vulnerabilities in every major operating system and major web browser, including a now patched OpenBSD bug that had remained in the code base for 27 years. Google said in March 2026 that its Big Sleep and Code Mender tools had already shown the ability to autonomously find and fix deep exploitable vulnerabilities. DARPA reported that AI Cyber Challenge finalist systems analyzed more than 54 million lines of code and found 54 unique synthetic vulnerabilities in the final competition, and Hacker One reported in October 2025 that fully autonomous agents had already submitted more than 560 valid reports.

The defensive side of the equation does not benefit from the same acceleration that AI brings to vulnerability identification and exploitation. Patching, testing, validation, change control, vendor coordination, maintenance scheduling, and production risk decisions continue to move through real organizations with real operational constraints. The cost of discovering and exploiting vulnerabilities has collapsed while the cost of securely remediating them remains largely fixed. This gap is not static, and it will continue to widen as these capabilities become more broadly available, which the industry expects to occur within months.

The Underlying Problem

The prevailing response across the industry has been to focus on faster patching, improved detection, and increased automation within security operations. Responses like these are warranted and suitable given the prevalence of vulnerable software and infrastructure across typical enterprises. However, they address the symptoms of the problem rather than the underlying cause.

The underlying cause is that technology has been developed for decades without treating security as a fundamental engineering requirement. NIST Special Publication 800-160 Volume 1, "Engineering Trustworthy Secure Systems," has articulated this position since 2016. The central premise of the framework is that security is an emergent property of a system. It is not something that can be tested into a product after development is complete. It must be engineered into the system from its earliest design decisions, addressed with appropriate rigor throughout the system lifecycle, and maintained through operational changes. The framework provides a comprehensive approach to systems security engineering rooted in established ISO/IEC/IEEE standards for systems and software engineering, addressing security from stakeholder protection needs through concept, development, production, utilization, support, and retirement.

The technology industry has largely not adopted this guidance. The reason is economic rather than technical. The true cost of developing a secure platform, one that treats security as a first-class engineering constraint alongside performance, reliability, and maintainability, has been systematically avoided because shipping software that meets functional requirements without rigorous security engineering was less expensive and the market did not impose meaningful consequences for doing so. The frameworks, standards, and engineering disciplines necessary to build secure systems have existed for years. They were not adopted because the cost of adoption exceeded the perceived risk of not adopting them.

AI vulnerability discovery has changed this risk calculation. When any motivated actor can direct an AI system at a codebase and surface exploitable flaws in hours, the latent risk embedded in every deferred security requirement, every architectural shortcut, and every design decision that prioritized schedule over trustworthiness becomes an active liability. The accumulated security debt across the software ecosystem is being exposed at a rate that was not anticipated when these systems were built.

Shifting Security Left as an Engineering Discipline

The concept of shifting security left has been discussed for years, primarily in the context of DevSecOps and the integration of automated scanning tools into CI/CD pipelines. While these practices provide value in identifying implementation-level defects such as buffer overflows, injection flaws, and misconfigurations, they do not address architectural weaknesses.

The vulnerabilities that Mythos-class systems are identifying include complex, chained vulnerabilities composed of multiple primitives. These are structural weaknesses that exist because security was not part of the design process. Automated scanning tools applied after development cannot identify every exploitable path through a system that was not architected to limit those paths from the outset.

A genuine shift-left approach requires starting with the problem and identifying what requires protection, the threat actors and their capabilities, and the consequences of failure. It requires making security architecture decisions during system design that limit the blast radius of any single vulnerability and prevent the exploitation of one flaw from providing an attacker with a path to broader compromise. Security requirements must be treated with the same engineering rigor applied to performance and reliability requirements. In an environment where vulnerability discovery is inexpensive and fast, the systems that will prove resilient are those that were engineered to tolerate failure rather than those that assumed their flaws would not be found.

The Installed Base and Accumulated Technical Debt

The discussion to this point has focused primarily on how new systems should be built. The more challenging problem is the installed base that organizations are currently operating.

Most organizations are not running modern, well-architected systems across their entire enterprise. They are operating decades of accumulated technical debt. This includes legacy platforms with undocumented dependencies, systems built on frameworks that predate modern security practices, and custom applications with questionable pedigrees. Many of these systems run on infrastructure that has been modified, migrated, and extended well beyond its original design intent. Flat network architectures with implicit trust relationships remain in place because they were established before the current threat model required zero trust approaches. Vendor appliances run embedded operating systems that have not received security updates in years.

This is the actual attack surface that most organizations must defend. AI vulnerability discovery does not distinguish between code written in 2024 and code written in 2004. It will identify exploitable flaws in both with equal efficiency, and the flaws in older systems will generally be more severe because those systems were built with fewer security constraints, less rigorous engineering practices, and architectural assumptions that the current threat environment has invalidated.

The technical debt discussion in most organizations focuses on code quality, maintainability, and performance. Security debt is rarely assessed with the same rigor, despite the fact that it represents a direct and measurable risk to the organization. This debt is embedded in every system that was built to meet functional requirements without being engineered to be secure. Every flat network segment, every over-privileged service account, every unpatched dependency in a legacy stack, and every system that cannot be taken offline for remediation because it supports a critical process without failover represents accumulated security debt. AI-driven vulnerability discovery has significantly increased the cost of carrying that debt.

Addressing this challenge requires an honest and current inventory of what is running, what it depends on, who maintains it, what its actual security properties are, and what the impact would be if it were compromised. For systems that cannot be re-engineered, the appropriate response is containment through segmentation, privilege reduction, monitoring, and isolation to ensure that the inevitable vulnerabilities in those systems do not provide pathways to broader compromise. For systems that can be replaced, the replacement must be built with the engineering discipline that 800-160 describes rather than inheriting the same shortcuts that created the current risk.

Implications for Organizations and the Profession

Organizations that build or acquire software must reassess what constitutes an acceptable standard of delivery. Meeting functional requirements is no longer sufficient. Understanding the security properties of a system under adversarial conditions must be part of the acceptance criteria.

Development organizations should adopt the systems security engineering discipline that 800-160 describes, including threat modeling during design, traceable security architecture decisions, security requirements that are verified rather than assumed, and assurance evidence that is maintained throughout the lifecycle. Additionally, the same AI capabilities that have created this threat should be directed inward. LLM-based vulnerability discovery is already mature enough to apply against an organization's own code and dependencies. The concept of a permanent vulnerability operations function that runs continuous AI-driven discovery against an organization's software estate is emerging as a necessary capability. Organizations that are not using these tools to evaluate their own code should expect that others will.

Organizations that primarily consume rather than build software must ask more substantive questions of their vendors regarding security architecture, trust boundary definitions, blast radius in the event of component compromise, and the assurance evidence that exists to verify that security properties were engineered into the product rather than assessed after the fact. Vendor risk management must evolve beyond questionnaires and compliance attestations into evaluation of how software was engineered.

For the security profession, this shift requires extending beyond the reactive posture that has characterized much of the discipline. Vulnerability management, incident response, and detection engineering remain critical capabilities, but they are not sufficient on their own. The profession must extend into the engineering process where the foundational risk decisions are made. Security experts who are able to help shape system architecture, turn stakeholder protection needs into engineering requirements, and assess design choices for security risks embody core shift left principles that strengthen technical architectures against threats.

The workforce dimension of this challenge also requires attention. A faster threat cycle increases the operational burden on security teams across triage, remediation, incident coordination, and technical judgment under uncertainty. Security teams operating at purely human speed will not sustain the pace that AI-driven discovery demands. Integrating AI and automation into defensive operations is becoming a baseline requirement across the profession.

Conclusion

Disclosures about vulnerabilities linked to Mythos and comparable systems are currently being released. The immediate operational challenge of managing an increased volume of critical vulnerabilities is real, and organizations should be prepared for a sustained increase in remediation activity. However, the more consequential issue is what the pattern of these discoveries reveals about how technology has been built.

Every vulnerability that AI discovers in existing software represents a design decision that did not account for adversarial conditions. The pattern is that systems have been built to work rather than engineered to be secure, and the consequences of that approach were manageable only because the cost of discovering the resulting flaws was high enough to limit their exploitation. That condition no longer holds. The organizations that invest in systems security engineering, that shift security into design and architecture as a genuine engineering discipline, and that apply the guidance of frameworks like NIST 800-160 as engineering practice rather than reference material will build systems capable of withstanding what is coming. Organizations that continue to rely exclusively on scanning, patching, and operational response will face an expanding backlog of vulnerabilities that were preventable at the point of design.

The cost of decades of building technology that meets functional requirements without being engineered for security is becoming visible. The question for every organization is whether they will address the underlying engineering problem or continue to manage the consequences of not addressing it, at a pace that is accelerating beyond what reactive measures alone can sustain.

This publication is written in the author’s personal capacity. Any views or opinions expressed are solely those of the author.

Dr. Darren Death

Darren Death is an ICIT fellow. He has extensive experience in leading enterprise efforts to secure information systems, protect privacy, and govern the responsible use of AI in alignment with federal mandates and mission-driven priorities.

About ICIT

The Institute for Critical Infrastructure Technology (ICIT) is a nonprofit, nonpartisan, 501(c)3think tank with the mission of modernizing, securing, and making resilient critical infrastructure that provides for people’s foundational needs. ICIT takes no institutional positions on policy matters. Rather than advocate, ICIT is dedicated to being a resource for the organizations and communities that share our mission. By applying a people-centric lens to critical infrastructure research and decision making, our work ensures that modernization and security investments have a lasting, positive impact on society. Learn more at www.icitech.org.