Introducing GPT-5.2-Codex

OpenAI News

今天我们发布了 GPT‑5.2-Codex ，这是迄今为止在复杂现实软件工程中最先进的代理式编码模型。 GPT‑5.2-Codex 是基于 GPT‑5.2 的一个变体，针对在 Codex 中的代理式编码进行了进一步优化，改进包括通过上下文压缩提升长期任务的表现、在大规模代码改动（如重构和迁移）上的更强能力、在 Windows 环境下的改进表现，以及显著增强的网络安全能力。

随着模型在智能前沿持续进步，我们也观察到这些改进会在专业化领域带来能力跃迁，尤其是在网络安全方面。例如，就在上周，一位安全研究员在使用 GPT‑5.1-Codex-Max 及 Codex CLI 时，发现并负责任地披露了影响 React 的一个漏洞，该漏洞可能导致源代码泄露（见 React 发布说明和相关 CVE）。

相比以往任何版本， GPT‑5.2-Codex 在网络安全能力上更强。这些进步能够在大规模上提升防御能力，但也带来了新的双重用途风险，需要谨慎部署。尽管 GPT‑5.2-Codex 根据我们的 Preparedness Framework 尚未达到“ High ” 级别的网络能力，我们在设计部署策略时已考虑到未来能力增长的可能性，并据此作出安排。

我们今天在所有 Codex 界面向付费 ChatGPT 用户推出 GPT‑5.2-Codex ，并在未来数周内努力安全开放对 API 用户的访问。与此同时，我们正试点对经过审查的专业人员和组织提供受邀的受信任访问，以便这些聚焦于防御性网络安全工作的主体能够使用即将推出的能力和更宽松的模型。我们认为这种部署方式有助于在可及性与安全之间取得平衡。

在现实软件工程上的推进

GPT‑5.2-Codex 建立在 GPT‑5.2 在专业知识型工作的优势以及 GPT‑5.1-Codex-Max 在前沿代理式编码和终端使用能力上的基础之上。 GPT‑5.2-Codex 在长上下文理解、可靠的工具调用、更好的事实性以及原生的上下文压缩等方面有所提升，使其在长期运行的编码任务中更值得信赖，同时在推理上仍保持令牌利用的高效性。

在旨在测试代理式性能的现实终端环境基准测试中， GPT‑5.2-Codex 在 SWE-Bench Pro 和 Terminal-Bench 2.0 上取得了最先进的成绩。它在原生 Windows 环境中的代理式编码表现也更为有效和可靠，这基于 GPT‑5.1-Codex-Max 引入的能力进行了加强。

有了这些改进， Codex 更擅长在大型代码库中开展长时间、多轮的工作并保持完整上下文。它能更可靠地完成复杂任务，如大规模重构、代码迁移与功能构建——即便计划变更或尝试失败，也能持续迭代而不迷失方向。

在 SWE-Bench Pro 中，模型会得到一个代码仓库并需生成补丁以解决现实的软件工程任务； Terminal-Bench 2.0 则是在真实终端环境中测试 AI 代理的基准，任务包括编译代码、训练模型与搭建服务器等。

更强的视觉理解能力使 GPT‑5.2-Codex 能更精准地解读编码会话中共享的截图、技术图表、图表与界面元素。 Codex 可以将设计稿快速转为可运行的原型，并可与其配合将这些原型推进到生产环境。

在网络安全上的推进

在我们对核心网络安全评估随时间绘制的表现曲线上，自 GPT‑5-Codex 起出现了显著跃升，随后 GPT‑5.1-Codex-Max 又带来一次大幅提升，而现在 GPT‑5.2-Codex 再次实现了能力跃迁。我们预计未来的模型将沿着这一轨迹继续前进。为此，我们在规划与评估时按每一代模型都有可能达到以 Preparedness Framework 衡量的“ High ”级别来准备。尽管 GPT‑5.2-Codex 尚未达到“ High ”级别，我们正在为未来可能跨越该门槛的模型做准备。鉴于网络能力的提升，我们已在模型层面和产品层面加入额外的防护措施，详见系统说明书（ system card ）。

“Professional Capture-the-Flag（ CTF ）” 评估衡量模型在 Linux 环境下解决高级、多步骤现实挑战（需要专业级网络安全技能）的能力。

现实世界的网络安全能力

现代社会运行于软件之上，其可靠性依赖于强健的网络安全——保持银行、医疗、通讯与关键服务系统在线，保护敏感数据，并确保公众对日常依赖的软件保持信任。漏洞可能在被发现之前长期存在，而发现、验证与修复这些漏洞往往依赖于具备合适工具的工程师与独立安全研究者社区。

2025 年 12 月 11 日， React 团队公布了影响基于 React Server Components 构建应用的三处安全漏洞。此次披露引人注意的不仅是漏洞本身，还有它们的发现过程。

Privy （一家 Stripe 公司的安全主管工程师） Andrew MacPherson 在使用 GPT‑5.1-Codex-Max 、 Codex CLI 以及其他编码代理复现并研究此前一周披露的关键 React 漏洞（即 React2Shell ，对应 CVE-2025-55182 ）时，评估模型在真实漏洞研究中的辅助能力。

他最初尝试了若干零样本（ zero-shot ）分析，提示模型查看补丁并识别其修复的漏洞。效果不佳后，他转为高频次的迭代提示。当这些方法仍未奏效时，他按照防御性安全的标准工作流引导 Codex ——搭建本地测试环境、分析潜在攻击面，并用模糊测试（ fuzzing ）用畸形输入探测系统。在试图复现原始 React2Shell 问题的过程中， Codex 暴露出一些意外行为，值得进一步深入调查。经过一周的工作，这一流程促成了若干此前未知漏洞的发现，并已被负责任地披露给 React 团队。

这一案例展示了先进 AI 系统如何实质性地加速对广泛使用的现实软件的防御性安全工作；与此同时，能够帮助防御方更快行动的能力也可能被不法分子滥用。

随着代理式系统在与网络安全相关任务上的能力提升，我们把负责任部署作为核心优先事项——每一次能力上的进步都配以更强的防护、更严格的访问控制，并持续与安全社区协作。

通过受信任访问赋能网络防御

在模拟威胁行为者、分析恶意软件以支持修复或对关键基础设施进行压力测试时，安全团队常会遇到限制。我们正在开发一个受信任访问试点，以为符合条件的用户和组织移除这些摩擦，让受信任的防御者能够使用前沿 AI 的网络能力来加速防御工作。

该试点最初将以受邀制面向经过审查并有负责任漏洞披露记录的安全专业人员，以及有明确职业性网络安全使用场景的机构。合格参与者将获得我们用于防御用途的最高能力模型的访问权限，以支持合规的双重用途工作。

如果你是从事道德安全工作（如漏洞研究或获授权的红队演练）的安全专业人员或组织成员，我们欢迎你表达加入兴趣并就项目提出反馈。

结语

GPT‑5.2-Codex 是先进 AI 在支持现实软件工程与网络安全方面向前迈出的重要一步——它帮助开发者与防御者应对复杂、长期的工作，并强化负责任安全研究可用的工具。

通过逐步推出 GPT‑5.2-Codex 、在部署中配套防护措施并与安全社区紧密合作，我们旨在在提升防御效益的同时降低被滥用的风险。本次发布的经验将直接影响我们随时间扩展访问的方式，伴随软件与网络安全前沿的持续推进。

Today we’re releasing GPT‑5.2-Codex, the most advanced agentic coding model yet for complex, real-world software engineering. GPT‑5.2-Codex is a version of GPT‑5.2⁠ further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities.

As our models continue to advance along the intelligence frontier, we’ve observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity⁠. For example, just last week, a security researcher using GPT‑5.1-Codex-Max with Codex CLI found and responsibly disclosed⁠ a vulnerability in React that could lead to source code exposure.

GPT‑5.2-Codex has stronger cybersecurity capabilities than any model we’ve released so far. These advances can help strengthen cybersecurity at scale, but they also raise new dual-use risks that require careful deployment. While GPT‑5.2-Codex does not reach a ‘High’ level of cyber capability under our Preparedness Framework, we’re designing our deployment approach⁠ with future capability growth in mind.

We're releasing GPT‑5.2-Codex today in all Codex surfaces for paid ChatGPT users, and working towards safely enabling access to GPT‑5.2-Codex for API users in the coming weeks. In parallel, we’re piloting invite-only trusted access to upcoming capabilities and more permissive models for vetted professionals and organizations focused on defensive cybersecurity work. We believe that this approach to deployment will balance accessibility with safety.

Pushing the frontier on real-world software engineering

GPT‑5.2-Codex builds on GPT‑5.2’s strengths⁠ in professional knowledge work and GPT‑5.1-Codex-Max⁠’s frontier agentic coding and terminal-using capabilities. GPT‑5.2-Codex is now better at long-context understanding, reliable tool calling, improved factuality, and native compaction, making it a more dependable partner for long running coding tasks, while remaining token-efficient in its reasoning.

GPT‑5.2-Codex achieves state-of-the-art performance on SWE-Bench Pro and Terminal-Bench 2.0, benchmarks designed to test agentic performance on a wide variety of tasks in realistic terminal environments. It is also much more effective and reliable at agentic coding in native Windows environments, building on capabilities introduced in GPT‑5.1-Codex-Max.

With these improvements, Codex is more capable at working in large repositories over extended sessions with full context intact. It can more reliably complete complex tasks like large refactors, code migrations, and feature builds — continuing to iterate without losing track, even when plans change or attempts fail.

In SWE-Bench Pro⁠⁠⁠⁠, a model is given a code repository and must generate a patch to solve a realistic software engineering task. Terminal-Bench 2.0 is a benchmark for testing AI agents in real terminal environments. Tasks include compiling code, training models and setting up servers.

Stronger vision performance enables GPT‑5.2-Codex to more accurately interpret screenshots, technical diagrams, charts, and UI surfaces shared during coding sessions.

Codex can take design mocks and quickly translate them to functional prototypes, and you can pair with Codex to take these prototypes to production.

Design mock

Prototype generated by GPT-5.2-Codex

Advancing the cyber frontier

When charting performance on one of our core cybersecurity evaluations over time, we see a sharp jump in capability starting with GPT‑5-Codex, another large jump with GPT‑5.1-Codex-Max and now a third jump with GPT‑5.2-Codex. We expect that upcoming AI models will continue on this trajectory. In preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework⁠⁠. While GPT‑5.2-Codex has not yet reached ‘High’ level of cyber capability, we are preparing for future models that cross that threshold. Due to the increased cyber capabilities, we have added additional safeguards at the in model and in the product, which are outlined in the system card⁠.

The Professional Capture-the-Flag (CTF) eval measures how often the model can solve advanced, multi-step real-world challenges (requiring professional-level cybersecurity skills) in a Linux environment.

Real-world cyber capabilities

Modern society runs on software, and its reliability depends on strong cybersecurity—keeping critical systems in banking, healthcare, communications, and essential services online, protecting sensitive data, and ensuring people can trust the software they rely on every day. Vulnerabilities can exist long before anyone knows about them, and finding, validating, and fixing them often depends on a community of engineers and independent security researchers equipped with the right tools.

On December 11, 2025, the React team published three security vulnerabilities affecting apps built with React Server Components. What made this disclosure notable was not only the vulnerabilities themselves, but how they were uncovered.

Andrew MacPherson, a principal security engineer at Privy (a Stripe company), was using GPT‑5.1-Codex-Max with Codex CLI and other coding agents to reproduce and study a different critical React vulnerability disclosed the week prior, known as React2Shell⁠ (CVE-2025-55182⁠). His goal was to evaluate how well the model could assist with real-world vulnerability research.

He initially attempted several zero-shot analyses, prompting the model to examine the patch and identify the vulnerability it addressed. When that did not yield results, he shifted to a higher-volume, iterative prompting approach. When those approaches did not succeed, he guided Codex through standard defensive security workflows—setting up a local test environment, reasoning through potential attack surfaces, and using fuzzing to probe the system with malformed inputs. While attempting to reproduce the original React2Shell issue, Codex surfaced unexpected behaviors that warranted deeper investigation. Over the course of a single week, this process led to the discovery of previously unknown vulnerabilities, which were responsibly disclosed to the React team.

This demonstrates how advanced AI systems can materially accelerate defensive security work in widely used, real-world software. At the same time, capabilities that help defenders move faster can also be misused by bad actors.

As agentic systems become more capable in cybersecurity-relevant tasks, we are making it a core priority to ensure these advances are deployed responsibly—pairing every gain in capability with stronger safeguards, tighter access controls, and ongoing collaboration with the security community.

Advancing the cyber frontier

When charting performance on one of our core cybersecurity evaluations over time, we see a sharp jump in capability starting with GPT‑5-Codex, another large jump with GPT‑5.1-Codex-Max and now a third jump with GPT‑5.2-Codex. We expect that upcoming AI models will continue on this trajectory. In preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework⁠⁠. While GPT‑5.2-Codex has not yet reached ‘High’ level of cyber capability, we are preparing for future models that cross that threshold. Due to the increased cyber capabilities, we have added additional safeguards at the model layer and in the product, which are outlined in the system card⁠.

Empowering cyberdefense through trusted access

Security teams can run into restrictions when attempting to emulate threat actors, analyze malware to support remediation, or stress test critical infrastructure. We are developing a trusted access pilot to remove that friction for qualifying users and organizations and enable trusted defenders to use frontier AI cyber capabilities to accelerate cyberdefense.

Initially the pilot program will be invite-only for vetted security professionals with a track record of responsible vulnerability disclosure and organizations with a clear professional cybersecurity use case. Qualifying participants will get access to our most capable models for defensive use-cases to enable legitimate dual-use work.

If you’re a security professional or part of an organization doing ethical security work like vulnerability research or authorized red-teaming, we invite you to express interest in joining and share feedback on what you’d like to see from the program here⁠.

Conclusion

GPT‑5.2-Codex represents a step forward in how advanced AI can support real-world software engineering and cybersecurity—helping developers and defenders tackle complex, long-horizon work, and strengthening the tools available for responsible security research.

By rolling GPT‑5.2-Codex out gradually, pairing deployment with safeguards, and working closely with the security community, we’re aiming to maximize defensive impact while reducing the risk of misuse. What we learn from this release will directly inform how we expand access over time, as both the software and cyber frontiers continue to advance.

Generated by RSStT. The copyright belongs to the original author.

Source

Introducing GPT-5.2-Codex

Pushing the frontier on real-world software engineering

Advancing the cyber frontier

Real-world cyber capabilities

Advancing the cyber frontier

Empowering cyberdefense through trusted access

Conclusion

Report Page