Strengthening cyber resilience as AI capabilities advance

Strengthening cyber resilience as AI capabilities advance

OpenAI News

人工智能模型在网络能力方面迅速进步,这既为网络防御带来实质性好处,也产生了必须谨慎管理的新型双用途风险。比如,在 capture-the-flag( CTF )挑战中的表现,从 2025 年 8 月的 27%(基于 GPT‑5 )提高到 2025 年 11 月的 76%(基于 GPT‑5.1-Codex-Max )。

我们预计未来的 AI 模型将沿着这一轨迹继续发展;为此我们按照每一代新模型都可能达到我们《 Preparedness Framework 》所界定的 “ High ” 水平来进行规划和评估。这里所说的“ High ”,指的是模型能够开发出针对防护良好系统的可运行零日远程漏洞利用,或能在复杂且隐蔽的企业或工业入侵行动中提供实质性帮助,从而导致现实世界的影响。本文说明了我们如何为达到这些能力水平的模型设计防护措施,确保它们在帮助防守方的同时限制被滥用的风险。

随着这些能力的发展, OpenAI 正在加大投入,强化模型在防御性网络安全任务上的能力,并开发帮助防守者更便捷地进行代码审计和漏洞修补的工具。我们的目标是让模型和产品为常常人手不足、资源匮乏的防守方带来显著优势。

像其他双用途领域一样,防御性和攻击性网络工作流程往往依赖相同的基础知识和技术。我们在投入多层防护措施,确保这些强大能力主要惠及防御用途,并尽量降低对恶意目的的助力。网络安全几乎触及每一个领域,这意味着不能只依赖单一类别的防护——例如单靠限制知识或仅用审查式访问——而需要一种纵深防御的方法,在权衡风险的同时赋能用户。实际上,这要求我们塑造能力的获取、引导与应用方式,使先进模型成为强化安全的工具,而不是降低滥用门槛的手段。

我们将此视为长期、持续性的投入,旨在不断增强防守方的优势并提升整个生态关键基础设施的安全态势。

遏制恶意使用

我们的模型在设计与训练时就以安全运行为导向,并辅以前瞻性的系统来检测和应对网络滥用。随着能力与威胁环境的变化,我们持续改进这些防护。尽管没有任何系统能在不严重影响防御性用途的情况下完全杜绝网络滥用,我们的策略是通过分层的安全栈来降低风险。

在此基础上,我们采取纵深防御策略,结合访问控制、基础设施加固、出站控制和监测,并辅以检测与响应系统、专门的威胁情报与内部风险项目,确保新兴威胁能被快速识别和阻断。这些防护旨在随威胁环境演进而更新——我们假设会有变化,并构建出能快速且恰当调整的机制。

在此基础上具体包括:

  • 训练模型在拒绝或安全回应有害请求的同时,对教育和防御用途保持最大帮助性:我们对前沿模型进行训练,使其在面对会明确导致网络滥用的请求时拒绝或以安全方式回应,同时在合法的防御和教育场景下尽量提供帮助。
  • 检测系统:我们在使用前沿模型的产品中维护并持续优化全系统的监测,以检测潜在恶意的网络活动。出现不安全行为时,我们可能阻断输出、将提示转给更安全或能力较低的模型,或升级处置。我们的执法流程结合自动化与人工复核,依据法律要求、严重性和重复行为等因素作出判断。我们也与开发者和企业客户密切合作,对齐安全标准并为负责任使用提供明确的升级通路。
  • 端到端红队测试:我们与专业红队组织合作,对我们的安全缓解措施进行评估和改进。他们的工作是以端到端方式尝试绕过所有防线,就像有决心且资源充足的对手会做的那样,从而帮助我们及早发现漏洞并加强整套系统。

加强生态系统韧性的一揽子举措

OpenAI 早期就投入将 AI 应用于防御性网络安全场景,并与全球专家紧密协作,推动模型及其应用成熟。我们重视为数字世界安全奋斗的全球网络安全从业者社区,并致力于提供支持防御工作的强大工具。随着新防护措施的推出,我们将继续与网络安全社区合作,判断 AI 在何处能切实增强韧性、何处又最需要周全的防护措施。

除这些合作外,我们还在建立一系列举措,帮助防守方加速响应,将防护措施扎根于现实需求,并促进大规模的负责任修复。

面向网络防御的信任访问计划

我们将很快推出一项信任访问计划,探索为符合条件、从事网络防御工作的用户和客户提供分层访问,以便他们在防御性用例中使用我们最新模型的增强能力。我们仍在探讨哪些能力可以广泛提供、哪些需要分层限制,这将影响该计划的未来设计。我们的目标是将此信任访问计划建设为通向更有韧性的生态系统的基石。

借助 Aardvark 扩展防御能力

我们的代理式安全研究员 Aardvark 现已进入内测,旨在帮助开发者和安全团队在大规模环境中发现并修复漏洞。它能扫描整个代码库并提出补丁建议,维护者可快速采用。它已通过对整个代码库的推理识别出开源软件中的新型 CVEs 。我们计划为部分非商业开源仓库提供免费的覆盖,以助力开源软件生态和供应链的安全。可在此处申请参与。

成立 Frontier Risk Council

我们将成立 Frontier Risk Council ,这是一个顾问组,旨在把经验丰富的网络防御者和安全从业者与我们的团队紧密联结。该理事会将以网络安全为起点,未来扩展到其它前沿能力领域。成员将就有用且负责任的能力与潜在滥用之间的界限提供建议,这些见解将直接用于指导我们的评估和防护措施。我们会尽快公布更多信息。

与行业共同构建对威胁模型的共识

最后,我们预期行业内任何前沿模型都可能存在被用于网络滥用的可行性。为应对这一点,我们通过由领先 AI 实验室和行业伙伴支持的非营利组织 Frontier Model Forum 与其他前沿实验室合作,致力于形成对威胁模型和最佳实践的共同认识。在此语境下,威胁建模有助于识别 AI 能力可能被武器化的路径、不同威胁行为者面临的关键瓶颈,以及前沿模型可能提供的实质性助力。这种合作旨在建立一致的、覆盖全生态的对威胁行为者和攻击路径的理解,使实验室、维护者和防守方能更好地改进缓解措施,并确保关键安全洞见迅速在生态中传播。我们也在与外部团队合作开展网络安全评估,希望独立评估的生态有助于进一步建立对模型能力的共识。

综上所述,这些努力体现了我们在长期内加强防御方能力的承诺。随着模型能力提升,我们的目标是确保这些能力能为防守方带来切实的杠杆作用——以现实需求为基础、由专家意见塑造并谨慎部署。同时我们计划探索更多举措和网络安全资助,以挖掘那些传统渠道可能无法催生的突破性想法,并从学界、产业和开源社区中众包大胆而有创见的防御方案。总体而言,这是一个持续推进的工作,我们将随着实践学习不断演进这些项目,聚焦最能提升现实世界安全的路径。



Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyberdefense as well as new dual-use risks that must be managed carefully. For example, capabilities assessed through capture-the-flag (CTF) challenges have improved from 27% on GPT‑5⁠ in August 2025 to 76% on GPT‑5.1-Codex-Max⁠ in November 2025.


We expect that upcoming AI models will continue on this trajectory; in preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework⁠. By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects. This post explains how we think about safeguards for models that reach these levels of capability, and ensure they meaningfully help defenders while limiting misuse.


As these capabilities advance, OpenAI is investing in strengthening our models for defensive cybersecurity tasks and creating tools that enable defenders to more easily perform workflows such as auditing code and patching vulnerabilities. Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced.


Like other dual-use domains, defensive and offensive cyber workflows often rely on the same underlying knowledge and techniques. We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes. Cybersecurity touches almost every field, which means we cannot rely on any single category of safeguards—such as restricting knowledge or using vetted access alone—but instead need a defense-in-depth approach that balances risk and empowers users. In practice, this means shaping how capabilities are accessed, guided, and applied so that advanced models strengthen security rather than lower barriers to misuse.


We see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.


Mitigating malicious uses




Our models are designed and trained to operate safely, supported by proactive systems that detect and respond to cyber abuse. We continuously refine these protections as our capabilities and the threat landscape change. While no system can guarantee complete prevention of misuse in cybersecurity without severely impacting defensive uses, our strategy is to mitigate risk through a layered safety stack.


At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring. We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly. These safeguards are designed to evolve with the threat landscape. We assume change, and we build so we can adjust quickly and appropriately.


Building on this foundation:


  • Training the model to refuse or safely respond to harmful requests while remaining helpful for educational and defensive use cases: We are training our frontier models to refuse or safely respond to requests that would enable clear cyber abuse, while remaining maximally helpful for legitimate defensive and educational use cases.
  • Detection systems: We refine and maintain system-wide monitoring across products that use frontier models to detect potentially malicious cyber activity. When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement. Our enforcement combines automated and human review, informed by factors like legal requirements, severity, and repeat behavior. We also work closely with developers and enterprise customers to align on safety standards and enable responsible use with clear escalation paths.
  • End-to-end red teaming: We are working with expert red teaming organizations to evaluate and improve our safety mitigations. Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might. This helps us identify gaps early and strengthen the full system.

Ecosystem initiatives to strengthen cyber resilience 




OpenAI has invested early in applying AI to defensive cybersecurity use cases and our team closely coordinates with global experts to mature both our models and their application. We value the global community of cybersecurity practitioners toiling to make our digital world safer and are committed to delivering powerful tools that support defensive security. As we roll out new safeguards, we will continue to work with the cybersecurity community to understand where AI can meaningfully strengthen resilience, and where thoughtful safeguards are most important.


Alongside these collaborations, we are establishing a set of efforts designed to help defenders move faster, ground our safeguards in real-world needs, and accelerate responsible remediation at scale.


Trusted access programs for cyberdefense



We will soon introduce a trusted access program where we explore providing qualifying users and customers working on cyberdefense with tiered access to enhanced capabilities in our latest models for defensive use cases. We're still exploring the right boundary of which capabilities we can provide broad access to and which ones require tiered restrictions, which may influence the future design of this program. We aim for this trusted access program to be a building block towards a resilient ecosystem.


Expanding defensive capacity with Aardvark



Aardvark, our agentic security researcher that helps developers and security teams find and fix vulnerabilities at scale, is now in private beta. It scans codebases for vulnerabilities and proposes patches that maintainers can adopt quickly. It has already identified novel CVEs in open-source software by reasoning over entire codebases. We plan to offer free coverage to select non-commercial open source repositories to contribute to the security of the open source software ecosystem and supply chain. Apply to participate here.


Frontier Risk Council



We will be establishing the Frontier Risk Council, an advisory group that will bring experienced cyber defenders and security practitioners into close collaboration with our teams. This council will start with a focus on cybersecurity, and expand into other frontier capability domains in the future. Members will advise on the boundary between useful, responsible capability and potential misuse, and these learnings will directly inform our evaluations and safeguards. We will share more on the council soon. 


Developing a shared understanding on threat models with the industry



Finally, we anticipate cyber misuse may be viable from any frontier model in the industry. To address this, we work with other frontier labs through the Frontier Model Forum, a nonprofit backed by leading AI labs and industry partners, to develop a shared understanding of threat models and best practices. In this context, threat modeling helps mitigate risk by identifying how AI capabilities could be weaponized, where critical bottlenecks exist for different threat actors, and how frontier models might provide meaningful uplift. This collaboration aims to build a consistent, ecosystem-wide understanding of threat actors and attack pathways, enabling labs, maintainers, and defenders to better improve their mitigations and ensure critical security insights propagate quickly across the ecosystem. We are also engaging with external teams to develop cybersecurity evaluations. We hope an ecosystem of independent evaluations will further help build a shared understanding of model capabilities.


Together, these efforts reflect our long-term commitment to strengthening the defensive side of the ecosystem. As models become more capable, our goal is to help ensure those capabilities translate into real leverage for defenders—grounded in real-world needs, shaped by expert input, and deployed with care. Alongside this work, we plan to explore other initiatives and cyber security grants to help surface breakthrough ideas that may not emerge from traditional pipelines, and to crowdsource bold, creative defenses from across academia, industry, and the open-source community. Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security.



Generated by RSStT. The copyright belongs to the original author.

Source

Report Page