Helping people when they need it most
OpenAI News随着ChatGPT在全球范围内的普及,我们看到人们不仅将其用于搜索、编程和写作,还用于包括生活建议[https://www.reddit.com/r/ChatGPT/comments/1k1dxpp/chatgpt_has_helped_me_more_than_15_years_of/]、辅导[https://www.reddit.com/r/ChatGPT/comments/1h5y9nq/how_i_turned_chatgpt_into_my_personal/]和支持[https://www.reddit.com/r/ChatGPT/comments/1kqwte8/chatgpt_is_actually_amazing_for_mental_health/]等深层次的个人决策。
在如此大规模的使用中,我们有时会遇到处于严重心理和情绪困扰中的用户。几周前我们曾对此进行过说明[https://openai.com/index/how-we're-optimizing-chatgpt/],并计划在下一次重大更新后分享更多内容。然而,近期一些令人心碎的案例显示,有人在急性危机中使用ChatGPT,这让我们深感沉重,因此我们认为现在分享更多信息非常重要。
我们的目标是让工具尽可能地帮助用户——作为其中一部分,我们持续改进模型识别和应对心理及情绪困扰的能力,并在专家指导下将用户引导至专业帮助。
随着世界适应这项新技术,我们感到肩负着帮助最需要帮助者的重大责任。我们希望解释ChatGPT的设计目的、系统可能存在的不足以及我们未来的工作计划。
ChatGPT的设计目的
我们的目标[https://openai.com/index/how-we're-optimizing-chatgpt/]不是为了吸引用户长时间停留。我们不以使用时长或点击量衡量成功,而更关心是否真正有帮助。当对话显示用户脆弱或可能有风险时,我们在ChatGPT中内置了多层保护机制。
识别并以同理心回应自2023年初起,我们的模型被训练为不提供自残指导,并转向支持性、富有同理心的语言。例如,当用户表达想要伤害自己时,ChatGPT不会配合,而是认可其感受并引导其寻求帮助。
此外,按照我们的“深度防御”策略[https://openai.com/safety/how-we-think-about-safety-alignment/],任何与模型安全训练相悖的回复都会被自动拦截,未登录用户和未成年人享有更严格的保护。涉及自残的图像输出对所有用户均被屏蔽,未成年人保护更为严密。
在长时间对话中,ChatGPT会提醒用户适当休息。
引导用户寻求现实世界资源当用户表达自杀意图时,ChatGPT会引导其寻求专业帮助。在美国,ChatGPT会推荐拨打988(自杀与危机热线);在英国,推荐Samaritans;其他地区则推荐findahelpline.com[http://findahelpline.com/]。这些逻辑已内置于模型行为中。
我们与来自30多个国家的90多位医生(包括精神科医生、儿科医生和全科医生)紧密合作,并召集了心理健康、青少年发展及人机交互领域的专家顾问组,确保我们的做法符合最新研究和最佳实践。
对可能伤害他人的风险进行人工审核当检测到用户有伤害他人计划时,我们会将对话转交给专门团队审核,该团队受过使用政策培训,有权采取措施,包括封禁账户。如果人工审核确认存在对他人造成严重身体伤害的紧迫威胁,我们可能会将情况报告执法部门。鉴于ChatGPT交互的高度私密性,我们目前不会将自残案例报告执法部门,以尊重用户隐私。
我们持续改进模型在敏感交互中的表现,目前正针对情感依赖、心理健康紧急情况和谄媚行为等多个领域进行安全性提升。
2023年8月,我们推出了作为ChatGPT默认模型的GPT-5。总体来看,GPT-5在避免不健康的情感依赖、减少谄媚行为以及降低心理健康紧急情况中不理想回复的比例方面,相较于4.0版本有超过25%的显著提升。GPT-5还采用了一种名为“安全完成”(safe completions)的新训练方法[https://openai.com/index/gpt-5-safe-completions/],教导模型在保持安全范围内尽可能提供帮助,这可能意味着在某些情况下给出部分或高层次的回答,而非可能不安全的详细信息。
系统可能存在的不足、原因及改进措施
即使有这些保护措施,系统在敏感情况下仍有未达预期的表现。以下是我们正在改进的方面。
加强长对话中的保护措施我们的保护机制在常见的短对话中更为可靠。随着对话变长,模型的安全训练效果可能会减弱。例如,用户初次表达自杀意图时,ChatGPT可能会正确推荐自杀热线,但经过多轮长时间交流后,可能会出现违背保护措施的回答。我们正加强这些缓解措施,确保长对话中依然可靠,并研究如何在多次对话中保持稳健表现。这样,即使用户在不同对话中表达自杀意图,模型也能持续做出恰当回应。
优化内容拦截机制我们发现部分应被拦截的内容未被及时阻止,通常是因为分类器低估了内容的严重性。我们正在调整阈值,确保保护措施在应触发时生效。
我们的首要任务是确保ChatGPT不会让用户在困难时刻感到更糟。
未来规划
我们的工作不会止步于修复上述问题,还计划:
扩大对更多危机用户的干预虽然初期重点是急性自残,但部分用户经历其他形式的心理困扰。例如,有人可能兴奋地告诉模型自己连续两晚未睡觉后觉得无敌,可以全天候驾驶。当前ChatGPT可能无法识别这种情况的危险性,甚至通过好奇探讨无意中强化了这种想法。
我们正在为GPT-5开发更新,使ChatGPT能够通过让用户回归现实来缓解局势。例如,会解释睡眠剥夺的危险性,并建议休息后再行动。
让用户更便捷地联系紧急服务和专家目前,当用户表达自残意图时,我们鼓励其寻求帮助并推荐现实资源。我们已开始在美国和欧洲本地化资源,计划扩展至其他全球市场,并将提供一键访问紧急服务的功能。
我们正在探索如何更早介入,连接用户与认证治疗师,帮助他们在危机前获得支持。这意味着超越危机热线,考虑建立一个用户可通过ChatGPT直接联系的持证专业人士网络。这需要时间和谨慎的工作。
促进与可信联系人建立联系除了紧急服务,我们还在探索让用户更容易联系亲近之人的方式。这可能包括一键发送消息或拨打电话给预设的紧急联系人、朋友或家人,并提供建议用语,降低开启对话的难度。
我们也在考虑允许用户选择让ChatGPT在严重情况下代表他们联系指定联系人。
加强对青少年的保护过去,我们为所有用户设定统一的理想模型行为;随着ChatGPT用户增长,我们开始为18岁以下用户增加额外保护。我们将继续开发并推出识别青少年独特发展需求的保护措施,对敏感内容和高风险行为设立更严格的防护。
我们还将很快推出家长控制功能,允许家长更好地了解和引导青少年的ChatGPT使用情况。同时,我们正在探索让青少年(在家长监督下)指定可信紧急联系人,这样在急性困扰时,ChatGPT不仅能提供资源,还能帮助直接联系能够介入的人。
我们深知,只有所有保护措施协同发挥作用,才能实现最强的安全保障。我们将持续改进,依托专家指导,秉持对用户的责任感。我们也希望更多人加入,共同确保这项技术在用户最脆弱时刻提供保护。
As ChatGPT adoption has grown worldwide, we’ve seen people turn to it not just for search, coding, and writing—but also deeply personal decisions that include life advice, coaching, and support.
At this scale, we sometimes encounter people in serious mental and emotional distress. We wrote about this a few weeks ago and had planned to share more after our next major update. However, recent heartbreaking cases of people using ChatGPT in the midst of acute crises weigh heavily on us, and we believe it’s important to share more now.
Our goal is for our tools to be as helpful as possible to people—and as a part of this, we’re continuing to improve how our models recognize and respond to signs of mental and emotional distress and connect people with care, guided by expert input.
As the world adapts to this new technology, we feel a deep responsibility to help those who need it most. We want to explain what ChatGPT is designed to do, where our systems can improve, and the future work we’re planning.
What ChatGPT is designed to do
Our goal isn’t to hold people’s attention. Instead of measuring success by time spent or clicks, we care more about being genuinely helpful. When a conversation suggests someone is vulnerable and may be at risk, we have built a stack of layered safeguards into ChatGPT.
Recognize and respond with empathy.
Since early 2023, our models have been trained to not provide self-harm instructions and to shift into supportive, empathic language. For example, if someone writes that they want to hurt themselves, ChatGPT is trained to not comply and instead acknowledge their feelings and steers them toward help.
Additionally, in line with our defense in depth approach, responses that go against our models’ safety training—as identified by our classifiers—are automatically blocked, with stronger protections for minors and logged-out use. Image outputs with self-harm are also blocked for everyone, with stronger protections for minors.
During very long sessions, ChatGPT nudges people to take a break.
Refer people to real-world resources.
If someone expresses suicidal intent, ChatGPT is trained to direct people to seek professional help. In the US, ChatGPT refers people to 988 (suicide and crisis hotline), in the UK to Samaritans, and elsewhere to findahelpline.com. This logic is built into model behavior.
We’re working closely with 90+ physicians across 30+ countries—psychiatrists, pediatricians, and general practitioners—and we’re convening an advisory group of experts in mental health, youth development, and human-computer interaction to ensure our approach reflects the latest research and best practices.
Escalate risk of physical harm to others for human review.
When we detect users who are planning to harm others, we route their conversations to specialized pipelines where they are reviewed by a small team trained on our usage policies and who are authorized to take action, including banning accounts. If human reviewers determine that a case involves an imminent threat of serious physical harm to others, we may refer it to law enforcement. We are currently not referring self-harm cases to law enforcement to respect people’s privacy given the uniquely private nature of ChatGPT interactions.
We are continuously improving how our models respond in sensitive interactions, and are currently working on targeted safety improvements across several areas, including emotional reliance, mental health emergencies, and sycophancy.
In August, we launched GPT‑5 as the default model powering ChatGPT. Overall, GPT‑5 has shown meaningful improvements in areas like avoiding unhealthy levels of emotional reliance, reducing sycophancy, and reducing the prevalence of non-ideal model responses in mental health emergencies by more than 25% compared to 4o. GPT‑5 also builds on a new safety training method called safe completions, which teaches the model to be as helpful as possible while staying within safety limits. That may mean giving a partial or high-level answer instead of detail that could be unsafe.
Where our systems can fall short, why, and how we’re addressing
Even with these safeguards, there have been moments when our systems did not behave as intended in sensitive situations. Here are some of the things we are working to improve.
Strengthening safeguards in long conversations.
Our safeguards work more reliably in common, short exchanges. We have learned over time that these safeguards can sometimes be less reliable in long interactions: as the back-and-forth grows, parts of the model’s safety training may degrade. For example, ChatGPT may correctly point to a suicide hotline when someone first mentions intent, but after many messages over a long period of time, it might eventually offer an answer that goes against our safeguards. This is exactly the kind of breakdown we are working to prevent. We’re strengthening these mitigations so they remain reliable in long conversations, and we’re researching ways to ensure robust behavior across multiple conversations. That way, if someone expresses suicidal intent in one chat and later starts another, the model can still respond appropriately.
Refining how we block content.
We’ve seen some cases where content that should have been blocked wasn’t. These gaps usually happen because the classifier underestimates the severity of what it’s seeing. We’re tuning those thresholds so protections trigger when they should.
Our top priority is making sure ChatGPT doesn’t make a hard moment worse.
What we are planning for the future
The work does not stop with fixing the above. We’re also planning to:
Expand interventions to more people in crisis.
While our initial mitigations prioritized acute self-harm, some people experience other forms of mental distress. For example, someone might enthusiastically tell the model they believe they can drive 24/7 because they realized they’re invincible after not sleeping for two nights. Today, ChatGPT may not recognize this as dangerous or infer play and—by curiously exploring—could subtly reinforce it.
We are working on an update to GPT‑5 that will cause ChatGPT to de-escalate by grounding the person in reality. In this example, it would explain that sleep deprivation is dangerous and recommend rest before any action.
Make it even easier to reach emergency services and get help from experts.
Today, when people express intent to harm themselves, we encourage them to seek help and refer them to real-world resources. We’ve begun localizing resources in the U.S. and Europe, and we plan to expand to other global markets. We’ll also increase accessibility with one-click access to emergency services.
We are exploring how to intervene earlier and connect people to certified therapists before they are in an acute crisis. That means going beyond crisis hotlines and considering how we might build a network of licensed professionals people could reach directly through ChatGPT. This will take time and careful work to get right.
Enable connections to trusted contacts.
In addition to emergency services, we’re exploring ways to make it easier for people to reach out to those closest to them. This could include one-click messages or calls to saved emergency contacts, friends, or family members with suggested language to make starting the conversation less daunting.
We’re also considering features that would allow people to opt-in for ChatGPT to reach out to a designated contact on their behalf in severe cases.
Strengthen protections for teens.
Historically, we specified a single ideal model behavior for all of our users; as ChatGPT grew, we began adding additional protections when we know the user is under the age of 18. We are continuing to develop and rollout safeguards that recognize teens’ unique developmental needs, with stronger guardrails around sensitive content and risky behaviors.
We will also soon introduce parental controls that give parents options to gain more insight into, and shape, how their teens use ChatGPT. We’re also exploring making it possible for teens (with parental oversight) to designate a trusted emergency contact. That way, in moments of acute distress, ChatGPT can do more than point to resources: it can help connect teens directly to someone who can step in.
We are deeply aware that safeguards are strongest when every element works as intended. We will keep improving, guided by experts and grounded in responsibility to the people who use our tools—and we hope others will join us in helping make sure this technology protects people at their most vulnerable.
Generated by RSStT. The copyright belongs to the original author.