The Need for Transparency in Frontier AI
Anthropic News一栋带柱子的建筑
[图片来源:https://www-cdn.anthropic.com/images/4zrzovbb/website/a4b4c0066fe7b5761115569317b97259567e207c-1920x1079.png]
“前沿”人工智能研发需要更大透明度
[文章来源:https://www.nytimes.com/2025/06/05/opinion/anthropic-ceo-regulate-transparency.html]
以保障公众安全,并让开发这一强大技术的公司承担责任。人工智能正迅速进步。虽然产业界、政府、学界等各方正在努力制定统一的安全标准和全面的评估方法——这一过程可能需历时数月甚至数年——但我们需要采取过渡性措施,以确保超级智能的安全、负责任和透明开发。
因此,我们提出了一套有针对性的透明度框架
[文档链接:https://www-cdn.anthropic.com/2fd0a7c7e1c46cbffd844a411cb89d0730320f91.pdf],
可在联邦、州或国际层面实施,仅适用于规模最大、最具影响力的 AI 系统及其开发者,并对安全实践披露提出明确要求。
我们的方案刻意避免过度指令式规范。我们认识到,随着 AI 科学的持续演进,监管必须保持简洁灵活,不应阻碍创新,也不应延缓 AI 带来生命救援药物研发、公共服务快速交付和国家安全等领域的潜在益处。若政府强行制定严格标准,反而会适得其反——正如 AI 评估方法往往在数月内就因技术进步而失效
[参考:https://hai.stanford.edu/news/ai-benchmarks-hit-saturation]。
最低 AI 透明度标准
以下是我们认为应指导 AI 透明度政策的核心原则:
- 仅适用于最大规模模型开发者
• 只对构建最强大模型的“前沿”开发者实施透明度要求。
• “前沿”可通过算力、成本、测评表现、年收入与研发投入等阈值综合判断。
• 为避免给创业公司和小规模开发者造成负担,应为风险较低者设立豁免。
• 建议的阈值示例:年收入约 1 亿美元以上,或年研发/资本支出约 10 亿美元。阈值应随行业演进定期审查。 - 建立安全开发框架
• 要求前沿模型开发者制定“安全开发框架”,说明其如何评估并降低模型造成的不可接受风险。
• 风险类别至少包括化学、生物、放射性和核危害,以及模型自主性失控带来的危害。
• 鉴于安全开发框架仍在不断演进,政策应保留灵活空间。 - 公布安全开发框架
• 要求将安全开发框架向公众公开,并可对敏感信息进行合理删节。
• 框架应发布在开发者自有的、面向公众的网站上,并附带自我认证声明,确认企业遵守其自身框架。 - 发布系统说明文件
• 以“系统说明书”或类似文档形式,概述测试与评估流程、结果及所采取的缓解措施(同样可对敏感信息进行删节)。
• 说明文件应在模型部署时公开,并在模型重大更新时予以更新。 - 保护举报者并禁止虚假陈述
• 明确将实验室就其透明度义务作出虚假声明视为违法,以便现有的举报者保护措施得以适用。
• 这样有助于将执法资源集中于故意不当行为的主体。 - 制定透明度标准
• 一个可行的 AI 透明度框架,应设定最低标准,以提升安全与公共利益,同时兼顾行业实践的演进性。
• 由于 AI 安全与保障实践尚处于早期,框架应以灵活、轻量化的要求为起点,并随着行业共识的形成不断完善。
这一透明度方案有助于揭示行业最佳安全实践,为负责任的实验室与不负责任的行为划清界限。例如,我们提出的安全开发框架类似于 Anthropic 自身的“Responsible Scaling Policy”及其他领先实验室(Google DeepMind、OpenAI、Microsoft)已发布的框架。若将安全开发框架的透明度要求立法,不仅可使行业最佳实践标准化且具有适应性,还能确保这些自愿披露的内容不会随着模型能力增强而被撤回。
各方对 AI 模型是否以及何时会带来灾难性风险尚存分歧。要求公开安全开发框架和系统说明书,可为决策者提供进一步立规的证据,也让公众了解这一强大新技术。
随着模型能力不断提升,我们迎来前所未有的机遇:加速科学发现、推动医疗进步和经济增长。但若开发不安全或不负责任,一次灾难性失误可能阻碍数十年发展。我们在此提出的透明度框架
[文档链接:https://www-cdn.anthropic.com/2fd0a7c7e1c46cbffd844a411cb89d0730320f91.pdf]
提供了切实可行的第一步:在保持私营部门敏捷性的同时,将安全实践向公众可见。

Frontier AI development needs greater transparency to ensure public safety and accountability for the companies developing this powerful technology. AI is advancing rapidly. While industry, governments, academia, and others work to develop agreed-upon safety standards and comprehensive evaluation methods—a process that could take months to years—we need interim steps to ensure that very powerful AI is developed securely, responsibly, and transparently.
We are therefore proposing a targeted transparency framework, one that could be applied at the federal, state, or international level, and which applies only to the largest AI systems and developers while establishing clear disclosure requirements for safety practices.
Our approach deliberately avoids being heavily prescriptive. We recognize that as the science of AI continues to evolve, any regulatory effort must remain lightweight and flexible. It should not impede AI innovation, nor should it slow our ability to realize AI's benefits—including lifesaving drug discovery, swift delivery of public benefits, and critical national security functions. Rigid government-imposed standards would be especially counterproductive given that evaluation methods become outdated within months due to the pace of technological change.
Minimum Standards for AI Transparency
Below are the core tenets we believe should guide AI transparency policy:
- Limit Application to the Largest Model Developers: AI transparency should apply only to the largest frontier model developers that are building the most capable models - where frontier models are distinguished by a combination of thresholds for computing power, computing cost, evaluation performance, annual revenue and R&D. To avoid burdening the startup ecosystem and small developers with models at low risk to national security or for causing catastrophic harm, the framework should include appropriate exemptions for smaller developers. We welcome input from the start-up community on what those thresholds should be. Internally, we've discussed the following examples for what the threshold could look like: annual revenue cutoff amounts on the order of $100 million; or R&D or capital expenditures on the order of $1 billion annually. These scoping thresholds should be periodically reviewed as the technology and industry landscape evolves.
- Create a Secure Development Framework: Require covered frontier model developers to have a Secure Development Framework that lays out how they will assess and mitigate unreasonable risk in a model. Those risks must include the creation of chemical, biological, radiological and nuclear harms, as well as harms caused by misaligned model autonomy. Secure Development Frameworks are still an evolving safety tool, so any proposal should strive for flexibility.
- Make the Secure Development Framework Public: The Secure Development Framework should be disclosed to the public, subject to reasonable redaction protections for sensitive information, on a public-facing website registered to and maintained by the AI company. This will enable researchers, governments, and the public to stay informed about the AI models deployed today. The disclosure should come with a self-certification that the lab is complying with the terms of their published Secure Development Framework.
- Publish a System Card: System cards or other documentation should summarize the testing and evaluation procedures, results and mitigations required (subject to appropriate redaction for information that could compromise public safety or the safety and security of the model). The system card should also be publicly disclosed at deployment, and updated if the model is substantially revised.
- Protect Whistleblowers by Prohibiting False Statements: Explicitly make it a violation of law for a lab to lie about its compliance with its framework. This clarification creates a clear legal violation that enables existing whistleblower protections to apply and ensures that enforcement resources are squarely focused on labs that have engaged in purposeful misconduct.
- Transparency Standards: A workable AI transparency framework should have a minimum set of standards so that it can enhance security and public safety while accommodating the evolving nature of AI development. Given that AI safety and security practices remain in their early stages, with frontier developers like Anthropic actively researching best practices, any framework must be designed for evolution. Standards should begin as flexible, lightweight requirements that can adapt as consensus best practices emerge among industry, government, and other stakeholders.
This transparency approach sheds light on industry best practices for safety and can help set a baseline for how responsible labs train their models, ensuring developers meet basic accountability standards while enabling the public and policymakers to distinguish between responsible and irresponsible practices. For example, the Secure Development Framework we describe here is akin to Anthropic’s own Responsible Scaling Policy and others from leading labs (Google DeepMind, OpenAI, Microsoft), all of whom have already implemented similar approaches while releasing frontier models. Putting a Secure Development Framework transparency requirement into law would not only standardize industry best practices without setting them in stone, it would also ensure that the disclosures (which are now voluntary) could not be withdrawn in the future as models become more powerful.
Views differ on whether and when AI models could pose catastrophic risks. Transparency requirements for Secure Development Frameworks and system cards could help give policymakers the evidence they need to determine if further regulation is warranted, as well as provide the public with important information about this powerful new technology.
As models advance, we have an unprecedented opportunity to accelerate scientific discovery, healthcare, and economic growth. Without safe and responsible development, a single catastrophic failure could halt progress for decades. Our proposed transparency framework offers a practical first step: public visibility into safety practices while preserving private sector agility to deliver AI's transformative potential.
Generated by RSStT. The copyright belongs to the original author.