Consensus accelerates research with GPT-5 and Responses API

每年都有数以百万计的新科研论文发表——远远超出任何个人能够读完的数量。

对科学家而言，问题不是获取知识有多难，而是要在海量文献中寻找、解读并将其串联起来。重大突破往往出现在已知边界之外，但研究人员花在寻找这些边界上的时间远多于越过它们去推进研究的时间。

为改变这种状况，面向超过 800 万用户的研究助手 Consensus 应运而生。由 Christian Salem 和 Eric Olson 创办的 Consensus 能检索、阅读并综合超过 2.2 亿篇同行评审的学术论文。其最新功能 Scholar Agent 是基于 GPT‑5 和 Responses API 的多代理系统，按研究人员的实际工作方式协同运作，能把从问题到结论的时间从数周缩短到数分钟。

但目标不仅是加速研究——更是让发现更快达成。Salem 说：“科学在更易获得时才能更快前进。我们的任务是让全球研究者都能找到、信任并基于证据采取行动。”

从学术搜索引擎到拟人化助理

最初的 Consensus 更像一个垂直领域的学术搜索引擎：索引论文、检索相关结果，并生成带有引用依据的摘要。但仅有搜索并不够。

Salem 指出：“研究不只是找到论文，还要解读结果、比对发现、连接思想。科学家若把大部分时间用于搜索、阅读和解释已有知识，就会减少真正做出新发现的时间。”

因此团队以一种新思路重构 Consensus：用名为 Scholar Agent 的多代理系统去模拟人类研究者的工作流程。基于 GPT‑5 与 Responses API，系统现在以协调化的代理工作流运行：

Planning Agent 将用户问题拆解并决定下一步行动
Search Agent 梳理 Consensus 的论文索引、用户私人文库与引文网络
Reading Agent 对论文逐篇或批量解读
Analysis Agent 综合结果、确定结构与图表，并撰写最终输出

每个代理职责狭窄，从而让推理更精确、减少幻觉（hallucination）。这种架构也让系统可以在没有达到质量门槛的情况下选择不回答；若没有符合标准的相关研究，助理会明确说明而非凭空编造。

Salem 说：“把工作流拆到不同代理上，能降低错误并让系统更有纪律性。没有哪个代理负担过重，这对可靠性至关重要。”

这个方法被团队称为“上下文工程”：在生成答案前先组装恰当的证据。每个回复都会附带一个结构化的“ research context pack ”——一套可追溯回原始研究的论文、元数据与关键发现。

“我们不希望研究者花时间去核对每一条说法，”Salem 说，“如果系统无法用真实证据支撑答案，就不会编造。”

用 Responses API 打造多代理编排

为支持多代理路由， Consensus 从 Chat Completions 迁移到 Responses API 。这一转变既提升了可靠性，也优化了成本，使团队能更细致地控制子代理调用。凭借 GPT‑5 的长上下文推理与可靠的工具调用，选择变得明朗。

早期评估证明了这一押注： GPT‑5 在工具调用准确性和规划稳定性上优于 GPT‑4.1 、 Sonnet 4 与 Gemini 2.5 Pro 。这让团队能把精力从“提示工程”转向构建与研究流程直接对应的代理行为。

面向个人研究者的策略

从一开始， Consensus 就采取了与传统不同的市场策略：不是把产品卖给机构，而是直接面向需要即时答案的研究者本人——学生、教师和临床医生。这种直达研究者的策略塑造了产品设计并推动了快速增长。

Salem 说：“大家都认为学术领域不能直面消费者，但 AI 改变了这一点。人们不再等批准——他们用有效的工具。”

因此 Consensus 更像现代消费类应用而非传统学术工具：快速上手、直观设计、对话式界面。通过口碑在校园和实验室间传播，研究生和博士候选人成为首批重度用户，随后是教师和私人研究者，接着是临床医生，他们用 Consensus 快速获取各自领域的最新证据。

Salem 说：“我们最初并非刻意为医生打造，但医生和研究者需要的是同样的东西：快速获得可靠证据。”公司最近与 Mayo Clinic 的医学图书馆达成合作，并推出了面向临床从业者的新功能 Medical Mode 。

与科学共同扩展

过去一年里， Consensus 快速扩张，用户增长超过 800 万，营收增长达 8 倍。不过这一增长并未改变产品优先级：每一项功能仍以可验证、低幻觉率的答案为中心。团队在评估流水线方面投入大量资源，测试答案的准确性、引用可追溯性和各代理间的风格一致性。

Consensus 的架构刻意保持模块化，便于随着模型进步新增代理——未来可加入能复现实验、生成图表或运行统计分析的代理。

Salem 总结道：“我们在为研究者构建真正需要的助理，模型在不断变好，系统也随之扩展，科学因此加速前进。”

加入社区与合作机会可在 OpenAI 平台上找到。

Every year, millions of new scientific papers are published—far more than any one person can read.

For scientists, the challenge isn’t access to knowledge but the overwhelming task of finding, interpreting, and connecting it. Breakthroughs happen at the edge of what’s known, yet researchers spend most of their time just finding the edges instead of pushing past them.

Consensus⁠, a research assistant used by more than 8 million people, was built to change that. Founded by Christian Salem and Eric Olson, the platform searches, reads, and synthesizes peer-reviewed literature across more than 220 million papers. Its newest capability, Scholar Agent, is a multi-agent system built on GPT‑5 and the Responses API. It mirrors how researchers actually work, helping them get from question to conclusion in minutes instead of weeks.

But the goal isn’t just faster research—it’s a faster path to discovery. “Science advances when it’s more accessible,” Salem says. “Our job is to give researchers everywhere the ability to find, trust, and act on evidence.”

From search engine to agentic assistant

The first version of Consensus worked like a vertical search engine for science: it indexed academic papers, retrieved relevant results, and generated summaries grounded in citations. But search alone wasn’t enough.

“Research isn’t just finding papers,” Salem says. “It’s interpreting results, comparing findings, and connecting ideas. The more time scientists spend searching, reading, and interpreting past knowledge for the right study, the less time they have to discover and createdo real research.”

So the team began re-architecting Consensus around a new concept: a multi-agent system called “Scholar Agent” that works the way a human researcher does.

Built on GPT‑5 and the Responses API, the system now runs a coordinated workflow of agents:

Planning Agent breaks down the user’s question and decides which actions to take next
Search Agent combs Consensus’s paper index, a user’s private library, and the citation graph
Reading Agent interprets papers individually or in batches
Analysis Agent synthesizes results, determines structure and visuals, and composes the final output

Each agent has a narrow scope, which keeps reasoning precise and minimizes hallucinations. The architecture also allows Consensus to decide when not to answer; if no relevant studies meet its quality threshold, the assistant simply says so.

“By dividing the workflow across agents, we reduce error and make the system far more disciplined,” Salem says. “No one agent has too much responsibility, which turns out to be key for reliability.”

This approach is what the team calls context engineering: assembling the right evidence before generation begins. Every answer comes with a “research context pack”—a structured bundle of papers, metadata, and key findings that trace back to original studies.

“We don’t want researchers wasting time double-checking every claim,” Salem says. “If the system can’t ground an answer in real evidence, it won’t make one up.”

Building with Responses API

Consensus migrated from Chat Completions to the Responses API to support its multi-agent routing. The switch improved both reliability and cost efficiency, giving the team finer control over sub-agent calls. With GPT‑5 long-context reasoning and reliable tool-calling, the choice was clear.

Early evaluations confirmed the bet: GPT‑5 outperformed GPT‑4.1, Sonnet 4, and Gemini 2.5 Pro on tool-calling accuracy and planning stability. That allowed the Consensus team to focus less on prompt gymnastics and more on building agent behaviors that map directly to research workflows.

A consumer bet in a world of institutions

From the beginning, Consensus approached the market differently than expected. Rather than selling through institutions, the team focused on the people doing the research itself: students, faculty, and clinicians who needed answers today. That direct-to-researcher focus shaped both the product’s design and its rapid growth.

“Everyone said you can’t go direct-to-consumer in academia, but AI has changed that,” Salem says. “People don’t wait for approval anymore—they use what works.”

That decision shaped the product’s tone and growth curve. Consensus feels more like a modern consumer app than a traditional academic tool: fast onboarding, intuitive design, conversational interface. Adoption spread through word of mouth across campuses and labs.

Graduate students and PhD candidates became the first power users, followed by faculty and private researchers. Then came clinicians, who began using Consensus to surface the latest evidence in their fields.

“We didn’t set out to build for doctors,” Salem says. “But they need the same thing researchers do: fast access to reliable evidence.”

The company recently signed the Mayo Clinic’s medical library and just launched ‘Medical Mode,’ a new feature designed for practitioners searching for clinical evidence.

Scaling with the science

In the past year, Consensus has expanded rapidly, growing to more than 8 million researchers worldwide and increasing revenue by 8x.

That growth hasn’t changed the product’s priorities. Every feature still revolves around verifiable, low-hallucination answers. The team has invested heavily in evaluation pipelines that test accuracy, citation traceability, and stylistic consistency across agents.

Consensus’s architecture is intentionally modular, designed so new agents can slot in as models expand and improve—agents that replicate experiments, generate figures, or run statistical analyses.

“We’re building the assistant researchers actually need in a rapidly changing world,” Salem says. “The models keep getting better, the system grows with them, and science moves faster.”

OpenAI <3 startups. Come build with us.

Join the community Start building

Generated by RSStT. The copyright belongs to the original author.

Source