Helping developers build safer AI experiences for teens

Helping developers build safer AI experiences for teens

OpenAI News

今日,我们发布了一套以提示(prompt)为基础的青少年安全策略包,旨在帮助开发者为青少年构建合适年龄段的保护措施。这些策略与我们开源权重的安全模型 gpt-oss-safeguard 配套,便于开发者将安全要求转化为可在真实系统中使用的分类器。

我们推出开源权重模型,是为了让更多人平等获取强大 AI 能力、促进广泛创新;但我们同时认为安全与创新应并行,开发者不仅应能使用高性能模型,也应掌握部署它们所需的工具与政策。为支持开发者保护年轻用户,我们在制定这些策略时征求了包括 Common Sense Media 和 everyone.ai 在内的外部可信机构的意见。

我们认识到青少年的需求不同于成年人,因此需要额外的保护。此次发布的策略旨在帮助开发者考虑这些差异,构建既能赋能又适合年轻用户的使用体验。

在更广泛保护青少年的工作基础上推进

我们长期致力于打造既能拓展青少年机会又能保障其安全的 AI。为此,我们在模型行为指南 Model Spec 中纳入了 U18 原则,并推出了诸如家长控制和年龄预测等产品级保护措施,同时通过 Teen Safety Blueprint 倡导行业层面的保护方案。

本次发布是在上述工作的延续。我们向开发者开放这些安全策略,旨在帮助他们为青少年部署保护措施,并在开源权重生态中推广可用的做法。

把青少年安全转化为清晰可用的政策

像 gpt-oss-safeguard 这样的安全分类器虽能识别有害内容,但前提是要有清晰的内容定义。开发者面临的最大挑战之一,就是把针对青少年的风险准确定义成能在真实系统中稳定执行的政策。即便是经验丰富的团队,也常常难以将高层次的安全目标转化为精确的可操作规则,这既需要领域专业知识,也需要深厚的 AI 能力;否则可能导致保护漏洞、执法不一致或过度过滤。清晰且范围界定明确的政策,是有效安全体系的基石。

帮助开发者将青少年安全落到实处

为应对上述挑战,我们发布了一套围绕青少年常见风险并基于相关研究的安全策略,这些策略以提示的形式组织,能直接与 gpt-oss-safeguard 及其他推理模型配合使用,便于开发者在系统中一致地应用安全标准并适配各自场景。

首批策略覆盖的领域包括:

  • 露骨暴力内容
  • 露骨性内容
  • 有害的身体形象与行为
  • 危险活动与挑战
  • 浪漫或暴力角色扮演
  • 年龄受限商品与服务

这些策略既可用于实时内容过滤,也可用于对用户生成内容的离线分析。以提示形式构建策略,能让开发者更容易将其并入现有工作流、根据使用场景调整并持续迭代。

与外部专家共同制定

我们在制定这些策略时,与 Common Sense Media 和 everyone.ai 等外部机构合作。他们的专业意见帮助确定了需要覆盖的内容范围、增强了提示结构,并细化了评估时应考虑的边界情况。这项工作反映了我们与专家及更广泛生态系统持续合作、改进 AI 如何支持年轻人的努力。

“青少年 AI 安全的一大缺口,是缺乏可供开发者直接采用的清晰、可操作的政策。很多时候,开发者只能从零开始。以提示为基础的这些策略有助于在生态系统中建立有意义的安全底线,而且作为开源发布,还可以被改编和完善。我们很高兴看到这类基础设施被广泛提供,希望能推动行业形成更多共享的青少年安全起点。”—— Robbie Torney , Common Sense Media 人工智能与数字评估负责人

“将青少年安全政策做得更可操作非常有价值,因为这能把专家知识转化为可在真实系统中应用的指导。内容政策是重要的第一步,也为进一步研究模型行为如何随时间影响青少年相关风险打开了门。受此启发并基于我们的研究, everyone.ai 也制定了一份关注排他性和过度依赖等风险的初步行为政策。”—— Dr. Mathilde Cerioli , everyone.AI 首席科学家

起点而非终点

这些策略旨在提供一个起点,而非对青少年安全的全面或最终定义。每个应用都有其独特的风险、受众与情境,开发者最了解其产品与 AI 集成可能带来的风险。我们强烈建议开发者根据自身需要调整并扩展这些策略,并配合其他保护措施使用,例如产品设计决定、用户控制、面向青少年的透明度、监测系统以及周到、适龄的响应机制。

我们认为“纵深防御”的分层方法对于构建更安全的 AI 系统至关重要。虽然这些策略借鉴了我们的内部经验,但并不代表 OpenAI 内部所有政策或全部保护措施。

未来路径

我们通过 ROOST Model Community 以开源方式发布这些策略,鼓励社区协作与迭代。欢迎到 RMC 的 GitHub 仓库提交贡献、反馈或分享更多青少年安全政策。开发者和组织可以根据具体应用调整这些策略、翻译成其他语言,并拓展到更多风险领域。我们希望随着时间推移,这能为在 AI 系统中实施安全政策建立更稳健、共享的基础。

如果要开始使用 gpt-oss-safeguard ,可从 Hugging Face 下载。



Today, we’re releasing prompt-based safety policies to help developers create age-appropriate protections for teens. Built to work with our open-weight safety model, gpt-oss-safeguard, these policies simplify how developers turn safety requirements into usable classifiers for real-world systems.


We released open weight models to democratize access to powerful AI and support broad innovation. At the same time, we believe safety and innovation go hand in hand, and that developers should have access to capable models as well as the tools and policies to deploy them safely and responsibly. We developed these policies to support developers in their safety efforts to protect young users, with input from trusted external organizations including Common Sense Media and everyone.ai.


We recognize that teens and adults have different needs, and that teens need additional protections. These policies are designed to help developers account for those differences and build experiences that are both empowering and appropriate for younger users.


Building on our broader work to protect young people



We have long been committed to building AI that expands opportunities for young people while keeping them safe. As part of this work, we updated our Model Spec—the guidelines that define the intended behavior of OpenAI’s models—to include Under-18 (U18) principles, and introduced product-level safeguards such as parental controls and age prediction to better protect younger users. We have also called for industry-wide protections through our Teen Safety Blueprint.


Today’s release builds on that foundation. We’re making these safety policies available to developers to support them in deploying safety protections for teens and helping democratize access across the open weights ecosystem.


Translating teen safety into clear, usable policies



While safety classifiers like gpt-oss-safeguard can detect harmful content, they depend on clear definitions of what that content is. In practice, one of the biggest challenges developers face is defining policies that accurately capture teen-specific risks and can be consistently applied in real systems.

Even experienced teams often struggle to translate high-level safety goals into precise, operational rules, especially since it requires both subject matter expertise and deep AI knowledge. This can lead to gaps in protection, inconsistent enforcement, or overly broad filtering. Clear, well-scoped policies are a critical foundation for effective safety systems.


Helping developers operationalize teen safety



To address this challenge, we are releasing a set of safety policies, tailored to common risks faced by teens and informed by careful review of existing research about teens’ unique developmental differences. These policies are structured as prompts that can be directly used with gpt-oss-safeguard and other reasoning models, enabling developers to more easily apply consistent safety standards across their systems.


The initial release includes policies covering:


  • Graphic violent content
  • Graphic sexual content
  • Harmful body ideals and behaviors
  • Dangerous activities and challenges
  • Romantic or violent roleplay
  • Age-restricted goods and services

These policies can be used for real-time content filtering, as well as offline analysis of user-generated content.


By structuring policies as prompts, developers can more easily integrate them into existing workflows, adapt them to their use cases, and iterate over time.










Developed with input from external experts



We worked with external organizations including Common Sense Media and everyone.ai to inform the development of these policies. Their expertise helped shape the scope of content to cover, strengthen the structure of the prompts, and refine the edge cases to consider when evaluating them. 


This work reflects an ongoing effort to collaborate with experts and the broader ecosystem to improve how AI systems support young people.


“One of the biggest gaps in AI safety for teens has been the lack of clear, operational policies that developers can build from. Many times, developers are starting from scratch. These prompt-based policies help set a meaningful safety floor across the ecosystem, and because they're released as open source, they can be adapted and improved over time. We're encouraged to see this kind of infrastructure being made available broadly, and we hope it catalyzes more shared youth-safety starting points across the industry.”


Robbie Torney, Head of AI & Digital Assessments, Common Sense Media


“Efforts like this that make youth safety policies more operational are valuable because they help translate expert knowledge into guidance that can be used in real systems. Content policies are an important first step, and they also open the door to broader work on how model behavior can shape youth-relevant risks over time. Inspired by this work and our own research, everyone.ai has also created an initial behavioral policy focused on risks like exclusivity and overreliance."


Dr. Mathilde Cerioli, Chief Scientist at everyone.AI


A starting point, not a complete solution



The policies are intended as a starting point, not as a comprehensive or final definition or guarantee of teen safety. Each application has unique risks, audiences and contexts, and developers are best positioned to understand the risks that their products and AI integrations may present. We strongly encourage developers to adapt and extend these policies based on their specific needs and combine them with other safeguards such as product design decisions, user controls, teen-friendly transparency, monitoring systems and thoughtful, age-appropriate responses. 


We believe a layered defense in depth⁠ approach is essential to building safer AI systems. These policies draw from our internal experience, but they do not reflect the full extent of OpenAI’s internal policies or safeguards.


The road forward



We are releasing these policies as open source through the ROOST Model Community to encourage collaboration and iteration. To contribute, provide feedback, or share additional teen safety policies, visit the RMC GitHub repository.


Developers and organizations can adapt these policies to their specific applications, translate them into different languages, and extend them to cover additional risk areas. Over time, we hope this contributes to a more robust and shared foundation for implementing safety policies in AI systems.


To get started with gpt-oss-safeguard, download it from Hugging Face.



Generated by RSStT. The copyright belongs to the original author.

Source

Report Page