Generated by RSStT

核技术本质上具有双重用途：驱动核反应堆的物理原理同样可能被滥用于武器开发。随着人工智能模型能力的提升，我们需要密切关注它们是否会以可能威胁国家安全的方式向用户提供危险的技术知识。

与核武器相关的信息尤其敏感，这使得一家私营公司单独评估这些风险变得具有挑战性。因此，去年四月，我们与美国能源部（DOE）下属的国家核安全管理局（NNSA）建立了合作伙伴关系，共同评估我们的模型在核扩散风险方面的表现，并持续与他们合作进行相关评估。

现在，我们不仅仅停留在风险评估阶段，还着手构建监测风险所需的工具。我们与NNSA及DOE国家实验室共同开发了一款分类器——一种能够自动对内容进行分类的人工智能系统，在初步测试中，该分类器能够以96%的准确率区分令人担忧的核相关对话和无害对话。

我们已经将该分类器部署在Claude模型的流量监控中，作为识别模型滥用的更广泛系统的一部分。早期部署数据显示，该分类器在实际Claude对话中表现良好。

我们将把这一方法分享给前沿模型论坛（Frontier Model Forum），这是一个由前沿人工智能公司组成的行业组织，希望这一合作模式能成为任何人工智能开发者与NNSA合作实施类似安全措施的蓝图。

除了确保前沿人工智能模型免受核滥用的具体重要性外，这一开创性努力还展示了公私合作伙伴关系的力量。这些合作结合了产业界和政府的互补优势，直面风险，使人工智能模型对所有用户更加可靠和值得信赖。

关于我们与NNSA合作及安全措施开发的详细信息，请访问我们的red.anthropic.com博客，这是Anthropic前沿红队（以及Anthropic其他团队偶尔发布内容）的研究平台，专注于前沿人工智能模型对国家安全的影响。点击这里阅读更多内容。

Nuclear technology is inherently dual-use: the same physics principles that power nuclear reactors can be misused for weapons development. As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security.

Information relating to nuclear weapons is particularly sensitive, which makes evaluating these risks challenging for a private company acting alone. That’s why last April we partnered with the U.S. Department of Energy (DOE)’s National Nuclear Security Administration (NNSA) to assess our models for nuclear proliferation risks and continue to work with them on these evaluations.

Now, we’re going beyond assessing risk to build the tools needed to monitor for it. Together with the NNSA and DOE national laboratories, we have co-developed a classifier—an AI system that automatically categorizes content—that distinguishes between concerning and benign nuclear-related conversations with 96% accuracy in preliminary testing.

We have already deployed this classifier on Claude traffic as part of our broader system for identifying misuse of our models. Early deployment data suggests the classifier works well with real Claude conversations.

We will share our approach with the Frontier Model Forum, the industry body for frontier AI companies, in hopes that this partnership can serve as a blueprint that any AI developer can use to implement similar safeguards in partnership with NNSA.

Along with the concrete importance of securing frontier AI models against nuclear misuse, this first-of-its-kind effort shows the power of public-private partnerships. These partnerships combine the complementary strengths of industry and government to address risks head-on, making AI models more reliable and trustworthy for all their users.

Full details about our NNSA partnership and the safeguards development can be found on our red.anthropic.com blog, the home for research from Anthropic’s Frontier Red Team (and occasionally other teams at Anthropic) on what frontier AI models mean for national security. Click here to read more.

Generated by RSStT. The copyright belongs to the original author.

Source

Generated by RSStT

Report Page