Introducing upgrades to Codex

Introducing upgrades to Codex

OpenAI News

今天,我们发布了 GPT-5-Codex —— 这是 GPT-5 的一个版本,经过进一步优化,专注于 Codex 中的自主编码。GPT-5-Codex 以现实世界的软件工程工作为训练重点,既擅长快速、交互式的编码会话,也能独立完成长时间、复杂的任务。它的代码审查能力能够在代码发布前捕捉关键漏洞。GPT-5-Codex 在你使用 Codex 的任何地方均可使用——它是云端任务和代码审查的默认模型,开发者也可以通过 Codex CLI 和 IDE 插件选择在本地任务中使用它。

自从我们在四月推出 Codex CLI(开源地址:https://github.com/openai/codex),五月推出 Codex 网页版(https://openai.com/index/introducing-codex/)以来,Codex 持续演进,成为更高效的编码协作者。两周前,我们将 Codex 统一为一个通过 ChatGPT 账户连接的产品体验,使你能无缝地在本地环境和云端之间切换工作而不丢失上下文。Codex 现在支持你开发的所有环境——终端、IDE、网页、GitHub,甚至 ChatGPT iOS 应用。Codex 包含在 ChatGPT Plus、Pro、Business、Edu 和 Enterprise 计划中。

通过这些更新,Codex 更接近我们一直以来的目标——成为一个理解你上下文、与你并肩工作、并能可靠承担团队任务的队友。


GPT-5-Codex

GPT-5-Codex 是 GPT-5 的一个版本,专门针对 Codex 中的自主软件工程进行了优化。它接受了复杂、真实工程任务的训练,比如从零构建完整项目、添加功能和测试、调试、执行大规模重构以及进行代码审查。它更易于引导,更好地遵循 AGENTS.md(http://agents.md/)中的指令,生成更高质量的代码——你只需告诉它需求,无需写冗长的风格或代码整洁度说明。

  • SWE-bench 验证:此前,包括 GPT-5 发布时,我们只报告了 477 个 SWE-bench 验证任务的结果,因为部分任务无法在我们的基础设施中运行。现在问题已修复,我们报告全部 500 个任务的结果。
  • 代码重构任务:我们的代码重构评测包含来自大型成熟仓库的重构风格任务,涵盖 Python、Go 甚至 OCaml。例如,Gitea 的一个 PR(https://github.com/go-gitea/gitea/commit/fd7d83ace60258acf7139c4c787aa8af75b7ba8c)修改了 232 个文件、3541 行代码,将 ctx 变量贯穿应用逻辑。

GPT-5-Codex 会根据任务复杂度动态调整思考时间。它结合了两项关键技能:与开发者配合进行交互式会话,以及对长时间复杂任务的持续独立执行。这意味着在处理小型、明确的请求或聊天时,Codex 反应更快;在处理大型重构等复杂任务时,则会花更多时间思考。测试中,GPT-5-Codex 能独立工作超过 7 小时,持续迭代实现方案,修复测试失败,最终交付成功实现。

在 OpenAI 员工使用数据中,针对按模型生成的 token 数排序的最低 10% 用户交互,GPT-5-Codex 使用的 token 数比 GPT-5 少 93.7%。而在最高 10% 复杂交互中,GPT-5-Codex 会花费两倍时间进行推理、编辑、测试和迭代。


代码审查能力

GPT-5-Codex 专门训练用于代码审查和发现关键缺陷。审查时,它会浏览代码库,推理依赖关系,运行代码和测试以验证正确性。我们在流行开源仓库的近期提交上评估了代码审查表现,由经验丰富的软件工程师对审查评论的正确性和重要性进行评估。结果显示,GPT-5-Codex 的评论更少出现错误或无关内容,能更好地聚焦关键问题,节省用户注意力。


前端任务表现

GPT-5-Codex 是前端任务的可靠伙伴。除了能创建美观的桌面应用外,在移动网站开发中也显著提升了用户偏好度。在云端工作时,它可以查看你提供的图片或截图,视觉检查进展,并向你展示工作截图。


适用环境

GPT-5-Codex 专为 Codex CLI、Codex IDE 插件、Codex 云环境和 GitHub 工作流设计,支持多样化工具使用。与通用模型 GPT-5 不同,我们建议仅在 Codex 或类似环境中使用 GPT-5-Codex 进行自主编码任务。


Codex 更新

我们最近对 Codex 进行了多项更新,提升其作为配对程序员的能力,包括重构的 Codex CLI 和全新的 Codex IDE 插件。

Codex CLI

Codex CLI 是开源的,社区反馈对其发展至关重要。基于反馈,我们围绕自主编码工作流重建了 CLI,使模型成为更强大可靠的伙伴。现在你可以在 CLI 中附加和分享图片(截图、线框图、流程图),以建立设计决策的共享上下文,精准获得所需结果。处理复杂任务时,Codex 会跟踪待办事项列表,并集成了网页搜索和 MCP 等工具,整体工具使用更准确。

终端界面也升级了:工具调用和差异显示格式更佳,易于跟踪。审批模式简化为三种:只读(需显式批准)、自动(完全工作区访问但需工作区外批准)和完全访问(可读取任意文件并运行带网络访问的命令)。支持压缩会话状态,方便管理长会话。

更多信息请查看 Codex CLI 快速入门:https://developers.openai.com/codex/cli

Codex IDE 插件

Codex 进入你已有的工作环境,包括 VS Code、Cursor 及其他 VS Code 分支。插件让你能无缝预览本地更改并编辑代码。使用 IDE 中的 Codex,可以写更短的提示,获得更快结果,因为它能利用打开的文件和选中的代码上下文。

插件还支持云端与本地环境间的流畅工作切换。你可以创建云端新任务、跟踪进行中任务、审查已完成任务,全部在编辑器内完成。完成收尾工作时,可在 IDE 中打开云端任务,Codex 会保持上下文。详情见快速入门:https://developers.openai.com/codex/ide

Codex 云端

除了 CLI,新的 IDE 插件和 GitHub 集成让 Codex 云端代理更贴合开发者工作流,你可以在编辑器或 GitHub 中直接委派任务,无需切换。

我们持续提升云基础设施性能。通过缓存容器,新任务和后续任务的中位完成时间缩短了 90%。Codex 还能自动搭建环境,扫描并执行常见的设置脚本,支持配置网络访问,运行如 pip install 等命令按需获取依赖。

和 CLI、IDE 插件一样,你可以用图片分享前端设计规范或说明 UI Bug。Codex 构建时能启动浏览器,查看成果,迭代改进,并将截图附加到任务和 GitHub PR。详情见文档:https://developers.openai.com/codex/cloud


代码审查功能

Codex 现包含训练有素的代码审查功能,能捕捉关键缺陷。不同于静态分析工具,它会匹配 PR 的意图与实际差异,推理整个代码库和依赖,执行代码和测试验证行为。只有最细致的人类审查员才会对每个 PR 付出如此努力,Codex 填补了这一空白,帮助团队更早发现问题,减轻审查负担,更有信心发布。

启用后,Codex 会自动审查 GitHub 仓库中从草稿到准备合并的 PR,并在 PR 中发布分析结果。若建议修改,你可在同一讨论串中让 Codex 实施。也可通过在 PR 中提及“@codex review”显式请求审查,并附加指导如“@codex review for security vulnerabilities”或“@codex review for outdated dependencies”。快速入门:https://developers.openai.com/codex/cloud/code-review

在 OpenAI 内部,Codex 现在审查绝大多数 PR,每天捕捉数百个问题,常常在人类审查开始前完成。这是 Codex 团队快速且自信推进工作的关键。


开发者使用案例

Cisco Meraki、Duolingo、Ramp、Vanta、Virgin Atlantic、Gap 等公司均在使用 Codex。

Cisco Meraki 技术负责人 Tres Wong-Godfrey 表示:

“我需要更新另一个团队维护的代码库以发布新功能。……借助 Codex,我将重构和测试生成工作交给它处理,自己专注于其他优先事项。它产出了高质量、经过完整测试的代码,我能快速交接,确保功能按计划上线且风险可控。”


构建安全可信的 AI 代理

我们构建 Codex 时,重点保护代码和数据不被泄露,防止滥用。默认情况下,无论本地还是云端,Codex 都在禁用网络访问的沙箱环境中运行(详见:https://platform.openai.com/docs/codex/agent-network),确保其无法对你的计算机执行有害操作,降低来自不可信来源的提示注入风险。

Codex 会在执行潜在危险操作前请求许可,并训练运行命令以验证输出。开发者可根据风险承受能力自定义安全设置。云端可限制网络访问至可信域名;CLI 和 IDE 插件中,开发者可批准命令以获得完全访问权限,或允许代理使用网页搜索和连接 MCP 服务器。此举可扩展代理能力,但也增加风险——详见安全操作指南:https://developers.openai.com/codex/security

我们始终鼓励开发者在更改或部署前审查代理工作。Codex 会为每个任务提供引用、终端日志和测试结果辅助审查。虽然 Codex 代码审查有助于降低危险问题上线风险,但无论是人类还是代理生成的代码,我们都建议将 Codex 作为辅助审查者,而非替代人类审查。

与 GPT-5 一致,我们将 GPT-5-Codex 视为在生物和化学领域具备高能力的模型,并实施了相应的安全防护措施。更多评估和安全策略见系统卡附录:https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex/


价格与可用性

Codex 包含在 ChatGPT Plus、Pro、Business、Edu 和 Enterprise 计划中。使用量随计划不同而扩展:Plus、Edu 和 Business 计划适合每周进行几次重点编码会话,Pro 计划支持跨多个项目的全周工作。

Business 计划可购买额外额度,支持开发者超出包含限制使用;Enterprise 计划提供共享额度池,按开发者实际使用计费。更多使用限制详情见:https://developers.openai.com/codex/pricing

对于通过 API 密钥使用 Codex CLI 的开发者,我们计划很快在 API 中提供 GPT-5-Codex。


Codex 正在成为我们一直设想的编码伙伴——更快、更可靠,深度集成你已使用的工具。我们期待看到你用它创造的成果,并将持续改进 Codex,使其成为你最具雄心项目的更佳队友。



Today, we’re releasing GPT‑5-Codex—a version of GPT‑5 further optimized for agentic coding in Codex. GPT‑5-Codex was trained with a focus on real-world software engineering work; it’s equally proficient at quick, interactive sessions and at independently powering through long, complex tasks. Its code review capability can catch critical bugs before they ship. GPT‑5-Codex is available everywhere you use Codex—it’s the default for cloud tasks and code review, and developers can choose to use it for local tasks via Codex CLI and the IDE extension.


Since we first launched Codex CLI in April and Codex web in May, Codex has steadily evolved into a more effective coding collaborator. Two weeks ago, we unified Codex into a single product experience connected by your ChatGPT account, enabling you to move work seamlessly between your local environment and the cloud without losing context. Codex now works where you develop—in your terminal or IDE, on the web, in GitHub, and even in the ChatGPT iOS app. Codex is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans.


With these updates, Codex moves closer to what we’ve been building toward all along—a teammate that understands your context, works alongside you, and reliably takes on work for your team.


GPT‑5-Codex




GPT‑5-Codex is a version of GPT‑5 further optimized for agentic software engineering in Codex. It’s trained on complex, real-world engineering tasks such as building full projects from scratch, adding features and tests, debugging, performing large-scale refactors, and conducting code reviews. It’s more steerable, adheres better to AGENTS.md instructions, and produces higher-quality code—just tell it what you need without writing long instructions on style or code cleanliness.





SWE-bench Verified: Historically, including at the time of the GPT‑5 launch, we reported results on 477 SWE-bench Verified tasks because some tasks couldn’t run in our infrastructure. We’ve since fixed this and now report on all 500 tasks.

Code refactoring tasks: Our code refactoring evaluation contains refactor-style tasks from large, established repositories and includes tasks in Python, Go and even OCaml. An example task is the following pull request from Gitea which changes 232 files and 3,541 lines to thread a ctx variable through the application logic.



GPT‑5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task. The model combines two essential skills for a coding agent: pairing with developers in interactive sessions, and persistent, independent execution on longer tasks. That means Codex will feel snappier on small, well-defined requests or while you are chatting with it, and will work for longer on complex tasks like big refactors. During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation.


On OpenAI employee traffic, we see that for the bottom 10% of user turns sorted by model-generated tokens (including hidden reasoning and final output), GPT‑5-Codex uses 93.7% fewer tokens than GPT‑5. Conversely, for the top 10%, GPT‑5-Codex thinks more, spending twice as long reasoning, editing and testing code, and iterating.






GPT‑5-Codex has been trained specifically for conducting code reviews and finding critical flaws. When reviewing, it navigates your codebase, reasons through dependencies, and runs your code and tests in order to validate correctness. We evaluated code review performance on recent commits from popular open-source repositories. For each commit, experienced software engineers evaluated review comments for correctness and importance. We find that comments by GPT‑5-Codex are less likely to be incorrect or unimportant, reserving more user attention for critical issues.








GPT‑5-Codex is a reliable partner on front-end tasks. In addition to creating aesthetic desktop apps, GPT‑5-Codex also shows significant improvements in human preference evaluations when creating mobile websites. When working in the cloud, it can look at images or screenshots you provide as input, visually inspect its progress, and display screenshots of its work to you.


GPT‑5-Codex was purpose-built for Codex CLI, the Codex IDE extension, the Codex cloud environment, and working in GitHub, and also supports versatile tool use. Unlike GPT‑5, which is a general-purpose model, we recommend using GPT‑5-Codex only for agentic coding tasks in Codex or Codex-like environments.


Updates to Codex




We also recently made some updates to make Codex a better pair programmer, with a revamped Codex CLI and the new Codex IDE extension.


Codex CLI



Codex CLI is open-source, and community feedback over the last few months has been invaluable in shaping its evolution. With this feedback, we’ve rebuilt Codex CLI around agentic coding workflows to harness our models into more capable and reliable partners. You can now attach and share images—screenshots, wireframes, and diagrams—right in the CLI to build shared context on design decisions and get exactly what you want. When doing more complex work, Codex now tracks progress with a to-do list, and includes tools like web search and MCP for connecting to external systems, with more accurate tool use overall.


The terminal UI has also been upgraded: tool calls and diffs are better formatted and easier to follow. Approval modes are simplified to three levels: read-only with explicit approvals, auto with full workspace access but requiring approvals outside the workspace, and full access with the ability to read files anywhere and run commands with network access. It also supports compacting conversation state to make longer sessions easier to manage.


Check out the Codex CLI quickstart to learn more.










The Codex IDE extension











Codex meets you where you already work, including in your IDE. The IDE extension brings the Codex agent into VS Code, Cursor, and other VS Code forks, so that you can seamlessly preview local changes and edit code with Codex. When you use Codex in your IDE, you can write shorter prompts and get faster results because Codex can use context like the files you’ve opened or the code you’ve selected.


The Codex IDE extension also lets you move work smoothly between the cloud and your local environment. You can create new cloud tasks, track in‑progress work, and review completed tasks without leaving your editor. To make finishing touches, you can open cloud tasks in your IDE, and Codex maintains context. Learn more on how to get the most out of the IDE extension in the quickstart.


Codex cloud



In addition to Codex CLI, the new IDE extension and GitHub integration bring the Codex cloud agent closer to developer workflows so you can now delegate tasks without switching away from your editor or GitHub.


Behind the scenes, we’ve also been continuously improving cloud infrastructure performance. By caching containers, we’ve slashed the median completion time for new tasks and follow-ups by 90%. Codex also now automatically sets up its own environment by scanning for common setup scripts and executing them, and with configurable internet access can run commands like pip install to fetch dependencies as needed at runtime.


Like in the CLI and IDE extension, you can now use images to share frontend design specs or explain UI bugs. As it builds for you, Codex can spin up its own browser, look at what it built, iterate, and attach a screenshot of the result to the task and GitHub PR. For more details, check out the docs.










Code review



Codex also now includes code review capabilities trained to catch critical flaws. Unlike static analysis tools, it matches the stated intent of a PR to the actual diff, reasons over the entire codebase and dependencies, and executes code and tests to validate behavior. Only the most thorough human reviewers put this level of effort into every PR they review, so Codex fills the gap—helping teams find problems earlier, reduce reviewer load, and ship with more confidence.


Once turned on for a GitHub repo, Codex automatically reviews PRs as they move from draft to ready, posting its analysis on the PR. If it recommends edits, you can stay in the same thread and ask Codex to implement them. You can also explicitly ask for a review by mentioning “@codex review” in a PR, and give it extra guidance like “@codex review for security vulnerabilities” or “@codex review for outdated dependencies”. Check out the quickstart to learn how to set up code review for your repos.


At OpenAI, Codex now reviews the vast majority of our PRs, catching hundreds of issues every day—often before a human review begins. It’s been key to letting the Codex team move fast with greater confidence.










How developers are using Codex




Cisco MerakiDuolingoRampVantaVirgin AtlanticGap



“I needed to update a codebase owned by another team for a feature release. [...] With Codex, I offloaded the refactoring and test generation while focusing on other priorities. It produced high-quality, fully tested code that I could quickly hand back — keeping the feature on schedule without adding risk.”
Tres Wong-Godfrey, Tech Lead at Cisco Meraki






Building safe and trustworthy AI agents




We’re building Codex with a focus on protecting code and data from exfiltration, and guarding against misuse. By default, Codex runs in a sandboxed environment with network access disabled, whether locally or in the cloud. This helps ensure Codex can’t take harmful actions on your computer, and reduces the risk of prompt injections from untrusted sources.


Codex can ask for permission before potentially dangerous actions and is trained to run commands to verify its outputs. Developers can customize security settings to match their risk tolerance. In the cloud, you can limit network access to trusted domains. In the CLI and IDE extension, developers can approve commands to run with full access or allow the agent to use web search and connect to MCP servers. This can expand capabilities of the agent while increasing the risks–learn more about how to securely operate and manage Codex here.


We always encourage developers to review the agent’s work before making changes or deploying to production. Codex provides citations, terminal logs, and test results with each task to help with this. While Codex code reviews help reduce the risk of dangerous issues being deployed to production, whether created by humans or agents, we always recommend using Codex as an additional reviewer—not a replacement for human reviews.


Consistent with our approach to GPT‑5, we have decided to treat GPT‑5-Codex as High capability in the Biological and Chemical domains, and have implemented safeguards to minimize the associated risks. Read more about our assessments and robust safety approach in the system card addendum⁠.


Pricing and availability




Codex is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Usage scales with your plan: Plus, Edu and Business seats can cover a few focused coding sessions each week, while Pro can support a full workweek across multiple projects.


Business plans can purchase credits to enable developers to go beyond their included limits, while Enterprise plans provide a shared credit pool so you only pay for what your developers use. Learn more about usage limits in ChatGPT here.


For developers using Codex CLI via API key, we plan to make GPT‑5-Codex available in the API soon.


Codex is becoming the coding partner we’ve always envisioned—one that’s faster, more reliable, and deeply integrated into the tools you already use. We’re excited to see what you build with it and will keep improving Codex to make it an even better teammate for your most ambitious projects.



Generated by RSStT. The copyright belongs to the original author.

Source

Report Page