Codex Security: now in research preview

今天我们推出 Codex Security ，一款面向应用层的安全代理。它会深入构建你项目的上下文，从而识别其他自动化代理工具常常漏掉的复杂漏洞，给出更高可信度的检测结果和可操作的修复建议，既能实质性提升系统安全，又能避免大量无关噪声。

在评估真实安全风险时，语境至关重要，但大多数 AI 安全工具只会标记低影响问题和误报，迫使安全团队花大量时间做甄别。与此同时，软件开发正被各种 agent 加速推进，使安全审查成为越来越关键的瓶颈。 Codex Security 解决了这两方面的问题：它把我们前沿模型的 agent 推理与自动化验证结合起来，产出高可信度的发现和可落地的修复建议，帮助团队把精力集中在真正重要的漏洞上，更快交付更安全的代码。

该产品此前名为 Aardvark ，去年以私测形式向少数客户开放。早期内部部署时，它发现了真实的 SSRF 、一个关键的跨租户认证漏洞，以及许多其他问题，我们的安全团队在数小时内修复了这些缺陷。与外部测试者的早期部署，帮助我们改进了用户提供相关产品上下文的方式，并优化了从引导到真正保护代码的流程。在测试期内，我们显著提升了检测质量：对同一代码库的多次扫描显示精确度在持续提高，在一个案例中，噪声相比最初推出时下降了 84%。我们将严重性被高报的比率降低了 90% 以上，所有仓库的误报率平均下降超过 50%。这些改进使 Codex Security 在上报的严重性与现实风险之间更加贴合，减轻了安全团队不必要的分发负担，我们预计信噪比会继续改善。

从今天起， Codex Security 将以研究预览形式通过 Codex web 向 ChatGPT Enterprise 、 Business 和 Edu 客户开放，接下来一个月内提供免费使用。

工作原理

Codex Security 借助 OpenAI 的前沿模型和 Codex agent ，通过将漏洞发现、验证和修补与具体系统上下文相结合，减少噪声并加速修复进程。

构建系统上下文并生成可编辑的威胁模型：配置扫描后，它会分析你的代码仓库，理解系统中与安全相关的结构，生成面向项目的威胁模型，描述系统功能、信任边界和最脆弱的暴露点。威胁模型可以由团队编辑，以保持 agent 与团队认知一致。
优先级排序与验证：以威胁模型为上下文搜索漏洞，并根据在你系统中的预期实际影响对发现进行分类。尽可能在沙箱验证环境中对发现进行压力测试，以区分信号与噪声。用户可以在“已验证发现”中查看这些分析。当 Codex Security 与为项目量身定制的运行环境一起配置时，它可以在实际运行的系统上下文中验证潜在问题。更深层次的验证能进一步降低误报率、生成可运行的概念验证（PoC），为安全团队提供更强的证据和更清晰的修复路径。
在完整系统上下文中提出补丁：最后， Codex Security 会根据系统意图和周边行为提出修复方案，使补丁在提升安全性的同时尽量避免引入回归，从而更安全、更易于审核与合并。用户可以筛选发现，专注于对团队最重要、影响最大的项。

随着使用反馈的累积， Codex Security 会不断学习并提高检测质量。你在调整某个发现的严重性时，系统会利用这些反馈来细化威胁模型，从而在后续扫描中提升精确度，逐步学会你的架构和风险侧重点。

该系统被设计为可在大规模环境中运行，呈现最高可信度的发现并提供易于采纳的补丁。在过去 30 天里， Codex Security 在测试用户的外部仓库中扫描了超过 1.2 百万次提交，识别出 792 起关键级别问题和 10,561 起高危问题。关键问题在被扫描提交中出现的比例低于 0.1%，表明系统能在海量代码中识别出有安全影响的问题，同时将审阅噪声降到最低。

客户反馈

“作为一家对产品安全高度关注的公司， NETGEAR 很高兴参与早期访问计划，结果超出预期。 Codex Security 无缝集成到我们成熟的安全开发环境中，加强了审查的速度和深度。它的发现既清晰又详尽，常常让人感觉像有一位资深产品安全研究员在团队旁并肩工作。” —— Chandan Nandakumaraiah， NETGEAR 产品安全主管，兼任 CVE Board 成员

支持开源社区

开源软件是现代系统的基石，我们自身也大量依赖它。我们已使用 Codex Security 扫描最关键的开源仓库，并将高影响的安全发现反馈给维护者，帮助夯实这一基石。

与维护者的对话中反复出现的一个主题是：问题不是缺少漏洞报告，而是大量低质量报告。维护者告诉我们，他们需要更少的误报和一种可持续的方式，以不增加额外分发负担的前提下，呈现真实的安全问题。这些反馈塑造了我们用 Codex Security 支持开源社区的方式。我们不是产生大量猜测性发现，而是构建一个优先呈现高可信度、维护者能快速采取行动的问题的系统。

在这项工作中，我们向多个广泛使用的开源项目报告了关键漏洞，包括 OpenSSH 、 GnuTLS 、 GOGS 、 Thorium 、 libssh 、 PHP 和 Chromium 等。已有 14 个 CVE 被分配，其中两项采用了双重报告——我们在附录中列出了一些示例。

我们最近开始将首批开源维护者纳入 Codex for OSS 项目，通过该计划为生态系统提供免费 ChatGPT Pro 与 Plus 账号、代码审查和 Codex Security 支持。像 vLLM 这样的项目已把 Codex Security 纳入常规工作流以发现并修补问题。

未来几周我们计划扩大该项目，让更多维护者能直接获得更好的安全能力、更强的审查流程以及对开源工作的支持。如果你是开源维护者并有兴趣，请联系我们（链接见原文）。

如何开始

我们将在未来几天向 ChatGPT Enterprise 、 Business 和 Edu 客户逐步开放 Codex Security 的访问。请参阅我们的文档了解为你的团队设置 Codex Security 的详细步骤（链接见原文）。

附录

由 Codex Security 发现的高影响开源漏洞示例：

GnuTLS certtool 堆缓冲区溢出（越界一位）— CVE-2025-32990
GnuTLS 在 SCT 扩展解析中的堆缓冲区越界读取 — CVE-2025-32989
GnuTLS otherName SAN 导出时的 double-free — CVE-2025-32988
GOGS 的 2FA 绕过 — CVE-2025-64175
GOGS 的未授权绕过 — CVE-2026-25242
路径遍历（任意写）— download_ephemeral、download_children（agent）— CVE-2025-35430
LDAP 注入（filters & DN）— LdapUserMap::new / get_unix_info / basic_auth_ldap — CVE-2025-35431
未认证的 DoS 与邮件滥用 — resend_email_verification — CVE-2025-35432 、 CVE-2025-35436
密码更改时会话未轮换 — User::update_user — CVE-2025-35433
禁用 TLS 验证 — Elasticsearch 客户端 — CVE-2025-35434
DoS：除以零 — /api/streams/depth/.../{split} — CVE-2025-35435
gpg-agent 通过 PKDECRYPT --kem=CMS（ECC KEM）触发的栈缓冲区溢出 — CVE-2026-24881
TPM2 PKDECRYPT（RSA 与 ECC）因缺失密文长度校验导致的基于栈的缓冲区溢出 — CVE-2026-24882
CMS/PKCS7 AES-GCM ASN.1 参数栈缓冲区溢出 — CVE-2025-15467
PKCS#12 PBMAC1 PBKDF2 keyLength 溢出与 MAC 绕过 — CVE-2025-11187

Today we’re introducing Codex Security, our application security agent. It builds deep context about your project to identify complex vulnerabilities that other agentic tools miss, surfacing higher-confidence findings with fixes that meaningfully improve the security of your system while sparing you from the noise of insignificant bugs.

Context is essential when evaluating real security risks, but most AI security tools simply flag low-impact findings and false positives, forcing security teams to spend significant time on triage. At the same time, agents are accelerating software development, making security review an increasingly critical bottleneck. Codex Security addresses both challenges. By combining agentic reasoning from our frontier models with automated validation, it delivers high-confidence findings and actionable fixes so teams can focus on the vulnerabilities that matter and ship secure code faster.

Formerly known as Aardvark⁠, Codex Security began last year as a private beta with a small group of customers. In early internal deployments, it surfaced a real SSRF, a critical cross-tenant authentication vulnerability, and many other issues which our security team patched within hours. Early deployments with external testers helped us improve how users provide relevant product context and move from onboarding to securing their code. We also significantly improved the quality of our findings over the course of the beta: scans on the same repositories over time show increasing precision, in one case cutting noise by 84% since initial rollout. We’ve reduced the rate of findings with over-reported severity by more than 90%, and false positive rates on detections have fallen by more than 50% across all repositories. These improvements help Codex Security better align reported severity with real-world risk and reduce unnecessary triage burden for security teams, and we expect the signal-to-noise ratio to continue to improve.

Starting today, Codex Security is rolling out in research preview to ChatGPT Enterprise, Business, and Edu customers via Codex web with free usage for the next month.

How Codex Security works

Codex Security leverages OpenAI’s frontier models and the Codex agent. It can reduce noise and accelerate remediation by grounding vulnerability discovery, validation, and patching in system-specific context.

Build system context and create an editable threat model: After configuring a scan, it analyzes your repository to understand the security-relevant structure of the system and generates a project-specific threat model that can capture what the system does, what it trusts, and where it is most exposed. Threat models can be edited to keep the agent aligned with your team.
Prioritize and validate issues: Using the threat model as context, it searches for vulnerabilities and categorizes findings based on expected real-world impact in your system. Where possible, it pressure-tests findings in sandboxed validation environments to distinguish signal from noise. Users can see this analysis in the validated findings. When Codex Security is configured with an environment tailored to your project, it can validate potential issues directly in the context of the running system. That deeper validation can reduce false positives even further and enable the creation of working proof-of-concepts, giving security teams stronger evidence and a clearer path to remediation.
Patch issues with full system context: Finally, Codex Security proposes fixes to the discovered issues that align with system intent and surrounding behavior. This enables patches that can improve security while minimizing regressions, making them safer to review and land. Users can filter the findings so they stay focused on what matters most to their team and has the highest security impact.

Codex Security can also learn from your feedback over time to improve the quality of its findings. When you adjust the criticality of a finding, it can use that feedback to refine the threat model and improve precision on subsequent runs as it learns what matters in your architecture and risk posture.

It’s designed to operate at scale and surface the highest-confidence findings with easy-to-accept patches. Over the last 30 days, Codex Security scanned more than 1.2 million commits across external repositories in our beta cohort, identifying 792 critical findings and 10,561 high-severity findings. Critical issues appeared in under 0.1% of scanned commits, showing that the system can identify security impacting issues in large volumes of code while minimizing noise to reviewers.

NETGEARvLLMRaptive

"As a company laser-focused on product security, NETGEAR was pleased to join the early access program, and the results exceeded expectations. Codex Security integrated effortlessly into our robust security development environment, strengthening the pace and depth of our review processes. Its findings were impressively clear and comprehensive, often giving the sense that an experienced product security researcher was working alongside us."

— Chandan Nandakumaraiah, Head of Product Security at NETGEAR and Member of CVE Board

Supporting the open source community

Open source software forms the foundation of modern systems, including our own. We've been using Codex Security to scan the open-source repositories we rely on most, sharing high impact security findings we identify with maintainers to help strengthen that foundation.

In our conversations with maintainers, a consistent theme emerged: the challenge isn’t a lack of vulnerability reports, but too many low-quality ones. Maintainers told us they need fewer false positives and a more sustainable way to surface real security issues without creating additional triage burden. These conversations helped shape how we’re supporting the open source community with Codex Security. Rather than generating large volumes of speculative findings, we are building a system that prioritizes high-confidence issues that maintainers can act on quickly.

As part of this work, we reported critical vulnerabilities to a number of widely used open-source projects including OpenSSH⁠, GnuTLS⁠, GOGS⁠, Thorium⁠ libssh, PHP, and Chromium, and more. Fourteen CVEs have been assigned with dual reporting on two — we've shared some examples in the Appendix.

We recently started onboarding an initial cohort of open-source maintainers into Codex for OSS, our program to support the ecosystem with free ChatGPT Pro and Plus accounts, code review, and Codex Security. Projects like vLLM have already used Codex Security to find and patch issues as part of their normal workflow.

We plan to expand the program in the coming weeks so more maintainers have a direct path to better security, stronger review workflows, and support for the open-source work the ecosystem depends on. If you’re an open-source maintainer and interested, please get in touch⁠.

Get started

We’ll be rolling out Codex Security access to ChatGPT Enterprise, Business, and Edu customers over the coming days. Check out our docs⁠ to learn more about setting up Codex Security for your team.

Appendix

Examples of high impact OSS vulnerabilities discovered by Codex Security:

GnuTLS certtool Heap-Buffer Overflow (Off-by-One) — CVE-2025-32990⁠
GnuTLS Heap Buffer Overread in SCT Extension Parsing — CVE-2025-32989⁠
GnuTLS Double-Free in otherName SAN Export — CVE-2025-32988⁠
2FA Bypass GOGS — CVE-2025-64175⁠
Unauth bypass GOGS — CVE-2026-25242⁠
Path traversal (arbitrary write) — download_ephemeral, download_children (agent) — CVE-2025-35430⁠
LDAP injection (filters & DN) — LdapUserMap::new / get_unix_info / basic_auth_ldap — CVE-2025-35431⁠
Unauthenticated DoS & mail abuse — resend_email_verification — CVE-2025-35432⁠ , CVE-2025-35436⁠
Session not rotated on password change — User::update_user — CVE-2025-35433⁠
Disabled TLS verification — Elasticsearch client — CVE-2025-35434⁠
DoS: division by zero — /api/streams/depth/.../{split} — CVE-2025-35435⁠
gpg-agent stack buffer overflow via PKDECRYPT --kem=CMS (ECC KEM) — CVE-2026-24881⁠
Stack-based buffer overflow in TPM2 PKDECRYPT for RSA and ECC due to missing ciphertext length validation — CVE-2026-24882⁠
CMS/PKCS7 AES-GCM ASN.1 params stack buffer overflow — CVE-2025-15467⁠
PKCS#12 PBMAC1 PBKDF2 keyLength overflow + MAC bypass — CVE-2025-11187⁠

Generated by RSStT. The copyright belongs to the original author.

Source

Codex Security: now in research preview

How Codex Security works

Supporting the open source community

Get started

Appendix

Report Page