AI + Pentest2026-03-079 min

OpenAI Codex Security: AI Agent for Application Security

OpenAI introduces Codex Security, an AI-powered application security agent that builds deep project context to identify complex vulnerabilities with high confidence.

D

DNA Research Team

Research Team, DNA Cyber Security

Today OpenAI is introducing Codex Security, their application security agent. It builds deep context about your project to identify complex vulnerabilities that other agentic tools miss, surfacing higher-confidence findings with fixes that meaningfully improve the security of your system while sparing you from the noise of insignificant bugs.

Context is essential when evaluating real security risks, but most AI security tools simply flag low-impact findings and false positives, forcing security teams to spend significant time on triage. At the same time, agents are accelerating software development, making security review an increasingly critical bottleneck. Codex Security addresses both challenges. By combining agentic reasoning from frontier models with automated validation, it delivers high-confidence findings and actionable fixes so teams can focus on the vulnerabilities that matter and ship secure code faster.

Formerly known as Aardvark, Codex Security began last year as a private beta with a small group of customers. In early internal deployments, it surfaced a real SSRF, a critical cross-tenant authentication vulnerability, and many other issues which their security team patched within hours.

📊 Scans on the same repositories over time show increasing precision, in one case cutting noise by 84% since initial rollout. The rate of findings with over-reported severity was reduced by more than 90%, and false positive rates on detections have fallen by more than 50% across all repositories.

Starting today, Codex Security is rolling out in research preview to ChatGPT Enterprise, Business, and Edu customers via Codex web with free usage for the next month.

How Codex Security works

Codex Security leverages OpenAI's frontier models and the Codex agent. It can reduce noise and accelerate remediation by grounding vulnerability discovery, validation, and patching in system-specific context.

Build system context and create an editable threat model: After configuring a scan, it analyzes your repository to understand the security-relevant structure of the system and generates a project-specific threat model that can capture what the system does, what it trusts, and where it is most exposed. Threat models can be edited to keep the agent aligned with your team.
Prioritize and validate issues: Using the threat model as context, it searches for vulnerabilities and categorizes findings based on expected real-world impact in your system. Where possible, it pressure-tests findings in sandboxed validation environments to distinguish signal from noise. When configured with an environment tailored to your project, it can validate potential issues directly in the context of the running system, reducing false positives even further and enabling the creation of working proof-of-concepts.
Patch issues with full system context: Codex Security proposes fixes to the discovered issues that align with system intent and surrounding behavior. This enables patches that can improve security while minimizing regressions, making them safer to review and land.

Codex Security can also learn from your feedback over time to improve the quality of its findings. When you adjust the criticality of a finding, it can use that feedback to refine the threat model and improve precision on subsequent runs as it learns what matters in your architecture and risk posture.

🔍 Over the last 30 days, Codex Security scanned more than 1.2 million commits across external repositories in the beta cohort, identifying 792 critical findings and 10,561 high-severity findings. Critical issues appeared in under 0.1% of scanned commits.

Supporting the open source community

Open source software forms the foundation of modern systems. OpenAI has been using Codex Security to scan the open-source repositories they rely on most, sharing high impact security findings with maintainers to help strengthen that foundation.

In conversations with maintainers, a consistent theme emerged: the challenge isn't a lack of vulnerability reports, but too many low-quality ones. Maintainers need fewer false positives and a more sustainable way to surface real security issues without creating additional triage burden.

As part of this work, OpenAI reported critical vulnerabilities to a number of widely used open-source projects including OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium. Fourteen CVEs have been assigned.

Appendix: Notable OSS vulnerabilities discovered

GnuTLS certtool Heap-Buffer Overflow (Off-by-One) — CVE-2025-32990
GnuTLS Heap Buffer Overread in SCT Extension Parsing — CVE-2025-32989
GnuTLS Double-Free in otherName SAN Export — CVE-2025-32988
2FA Bypass GOGS — CVE-2025-64175
Unauth bypass GOGS — CVE-2026-25242
Path traversal (arbitrary write) — download_ephemeral, download_children (agent) — CVE-2025-35430
LDAP injection (filters & DN) — LdapUserMap::new / get_unix_info / basic_auth_ldap — CVE-2025-35431
Unauthenticated DoS & mail abuse — resend_email_verification — CVE-2025-35432, CVE-2025-35436
Session not rotated on password change — User::update_user — CVE-2025-35433
gpg-agent stack buffer overflow via PKDECRYPT --kem=CMS (ECC KEM) — CVE-2026-24881
Stack-based buffer overflow in TPM2 PKDECRYPT for RSA and ECC — CVE-2026-24882
CMS/PKCS7 AES-GCM ASN.1 params stack buffer overflow — CVE-2025-15467
PKCS#12 PBMAC1 PBKDF2 keyLength overflow + MAC bypass — CVE-2025-11187

As a company laser-focused on product security, NETGEAR was pleased to join the early access program, and the results exceeded expectations. Codex Security integrated effortlessly into our robust security development environment, strengthening the pace and depth of our review processes. Its findings were impressively clear and comprehensive, often giving the sense that an experienced product security researcher was working alongside us. — Chandan Nandakumaraiah, Head of Product Security at NETGEAR

#OpenAI#Codex Security#AI Agent#Vulnerability Detection#CVE#Open Source#AppSec

Ready for Human + AI Security?

Experience next-gen Penetration Testing — where 15+ year experts combine cutting-edge AI to protect your business.

Contact us now

OpenAI Codex Security: AI Agent for Application Security

How Codex Security works

Supporting the open source community

Appendix: Notable OSS vulnerabilities discovered

Related Posts

Thinking with the Machine: How LLMs Change the Way We Build Offensive Capabilities

Needle in the Haystack: LLMs for Vulnerability Research

How I use LLMs For Security Work: Part 2

Ready for Human + AI Security?