Early experiments in accelerating science with GPT-5

科学影响着从人类健康到能源生产、从国家安全到我们对宇宙的认知等方方面面。如果人工智能能够加速科学——缩短从产生新想法到将其验证为结果的时间——那么整个社会将从中获得复合性的益处。

但创新步伐仍然受限。即便想法正确，把它变成产品或疗法也可能耗费数年。最近一项调查显示，在美国有 60% 的人认为科学和医学突破传播得太慢，73% 认为需要更有效的方式加速发现，69% 将科学领导力列为国家优先事项。

今天我们发布了论文 " Early science acceleration experiments with GPT‑5 "，该论文由我们与多所大学和国家实验室合作撰写，包括 Vanderbilt 、 UC Berkeley 、 Columbia 、 Oxford 、 Cambridge 、 Lawrence Livermore National Laboratory 和 The Jackson Laboratory 。论文汇编了数学、物理、生物学、计算机科学、天文学与材料科学等领域的早期案例：在这些案例中， GPT‑5 帮助研究者以新方式整合已有结果、开展强有力的文献梳理、加速复杂计算，甚至生成未解命题的新证明。论文同时记录了这些系统的局限性，我们的目标是让学界能清晰地看到这些系统在研究情境中今天能做什么、不能做什么。

这些案例展示了在专家手中， GPT‑5 如何加速科学发现，以及这种加速为何重要：

生物学：在 Derya Unutmaz 博士领导的一项研究中，科学家们为解释人类免疫细胞的一个疑难变化耗费数月。 GPT‑5 从一张未发表的图表中数分钟内指出可能机制，并提出了一个实验，随后实验验证了该机制。这类速度能帮助研究者更快地理解疾病并开发更好的治疗方案。
数学：Mehtaab Sawhney 与 Mark Sellke 在攻克一个由 Paul Erdős 提出的数十年未解问题时陷于最后一步。 GPT‑5 提出了关于某一奇数如何打破模式的新思路，帮助他们完成了证明。此类进展强化了许多算法与安全技术赖以运行的数学基础。
算法与优化：Sébastien Bubeck 与 Christian Coester 在检验一种常用决策方法（被用于机器人与路由等领域）是否像人们想象的那样可靠时， GPT‑5 给出了一个清晰的新例子，表明该方法可能失效，并改进了一个经典的优化结果。这类进展有助工程师更好地理解在实际应用中使用的决策系统。

什么是 " OpenAI for Science "？

" OpenAI for Science " 的使命是加速科学发现：帮助研究者探索更多想法、更快地检验假设，并发掘本来需要大量时间才能得到的洞见。我们通过将前沿模型与合适的工具、工作流程和合作相结合来实现这一点。

我们与学界、产业界和国家实验室的研究者紧密合作。这些合作帮助我们了解模型在哪些场景下有用、何处会失败，以及如何将其融入科学流程——从文献综述与证明生成到建模、模拟与实验设计。我们的做法结合两种互补的信念：专门化的科学工具（如仿真引擎、蛋白质数据库和计算机代数系统）对效率和精度至关重要；同时，扩大基础模型规模能持续解锁新的推理能力：跨领域连接想法、勾勒证明、提出机制，并以概念方式而非仅靠关键词来梳理大量文献。存在专门工具的地方我们会优先使用；需要通用推理的地方我们则构建能够胜任的模型，两条路径相辅相成。

科学家今天如何与 GPT‑5 合作

最有意义的进展来自人类—AI 团队协作。科学家制定议题：界定问题、选择方法、批评思路并验证结果； GPT‑5 则带来更大的探究广度、更快的速度与并行探索多条方向的能力。

有效使用 GPT‑5 是一项技能。研究者学习如何提出问题、何时质疑、如何把问题拆解成步骤、以及哪些结论需要独立验证。富有成效的工作往往像一段对话——研究者与模型反复迭代，直到出现可行方向或放弃某个思路。

GPT‑5 在科学工作中的现状

在这些早期研究中， GPT‑5 在专家使用下能缩短研究流程的若干环节。它不会独立主持项目或自动解决科学问题，但能扩大探索的表面面积，帮助研究者更快走向正确结果。

一个新兴能力是概念性文献检索。 GPT‑5 常能识别思想之间更深层的关联，并检索到跨语言或较难获取的相关资料。研究者报告说找到了之前未曾注意到的引用、联系与论点。
在结构明确且反馈快速的数学与理论计算机科学领域， GPT‑5 尤其有用。数学家已用 GPT‑5 在几分钟内生成可行的证明大纲，将本来可能耗费数天或数周的工作大幅提速。在物理与计算领域，模型可以提出简化变换或指出其他领域的类比结构。
在生物学和其他经验科学中，模型能提出机制并设计湿实验以验证这些假说。

我们已超出那种模型仅仅总结现有知识的阶段。如今，在专家监督下， GPT‑5 的早期贡献能对研究产生实质性帮助。能力与工具的进步预示着未来可能出现更深层次的加速。

若干具体实践案例（节选）

凸优化中收紧定理：在关于何时梯度下降序列随时间呈凸性（无下凹）以便更易分析的论文中，最初结论仅适用于很小的步长。 S ébastien Bubeck 把较弱的结论给了 GPT‑5，模型提出了更尖锐的步长界限和更清晰的标准证明， Bubeck 之后手工逐条验证；在内部更长时间运算下，模型甚至从零推导出了最优界限。
黑洞附近的隐含对称性：在描述旋转黑洞的 Kerr 解中，波动方程存在隐藏对称性。最初直接询问时 GPT‑5 Pro 未能发现有趣对称；但在先给出一个平直空间的“热身”问题后，回到 Kerr 问题，模型在约 18 分钟内生成了可闭合为 SL(2,ℝ) 代数的全套对称生成元，结果与人类推导一致。
免疫学机制洞察：在与 2‑脱氧葡萄糖（ 2DG ）短暂处理后的人类 CD4+ 细胞实验中，研究组花了数月实验与文献研读才提出了合理机制。几年后， Unutmaz 将一张未发表的流式细胞图像交给 GPT‑5 Pro，模型在十几分钟的互动中提出 N‑连糖基化被扰动是驱动因子，并预测记忆性而非常规初始性细胞是主导亚群，随后还建议了包括甘露糖“救援”在内的具体实验——实验室先前已做过该实验，结果正好与模型预测一致；模型还对 CD8+ 细胞的未发表数据做出预测，亦与实验数据相符。
深度文献检索：当 Nikita Zhivotovskiy 完成一个凸几何定理后，他将定理的形式陈述交给 GPT‑5，询问可能的应用领域。模型指出了密度估计、学习理论和多目标优化等方向，并检索出若干该作者未见过的具体参考文献，包含非英语资料。
清理并贡献于 Erdős 问题数据库： Paul Erdős 提出过上千个问题，其中一些虽已有解但散见于偏僻或非英语期刊，仍被数据库标为“开放”。 Mehtaab Sawhney 与 Mark Sellke 让 GPT‑5 在该数据库上检索解或实质性进展，模型找到了若干仍标“开放”的完整解、指认出部分重要结果，并指出一道题目的印刷错误。对于 Erdős Problem #848， GPT‑5 提出了一个关键的密度估计，研究者在此基础上修正并收紧得出完整证明，最终关闭该题。
警示案例：在研究一种避免“团状”错误的纠错码时， GPT‑5 将问题重新表述为有限域上的二次方程并指出了 Chevalley–Warning theorem，从而直接指向了正确的下界。但后来发现，完全相同的界与证明几年前已有人发表； GPT‑5 在最初未引用该先前工作，只有在新会话中被再次询问时才指出，提醒我们在 AI 辅助的数学中，正确与优雅的推理不等同于可靠的来源归属，人工核查与引用仍不可或缺。

与 AI 并肩工作

组合数学：菲尔兹奖得主 Tim Gowers 将 GPT‑5 当作“研究合作者”来做多次实验，给出他正在思考的困难组合问题，要求模型提出构造、找反例或批评部分论证。模型在多起案例中迅速发现构造中的漏洞或遗漏的情形，并提出更简单的替代或反例；在其他情况下则陷入停滞。 Gowers 的结论是：模型已是一个快速而知识丰富的批判者，可用于压力测试想法并节省时间，但尚未达到能并列署名的标准。
宇宙学模型： Robert Scherrer 用 GPT‑5 校验推导、探索简化宇宙学模型、在不同的参数化之间做转换。模型尤为擅长抓住代数错误、给出等价表述，并指出文献中与其独立推导结果相吻合的既有结论，减少了把想法转成可与数据比较形式之间的摩擦。
聚变与等离子体物理：在构建与分析简单反应—扩散模型以研究热核燃烧传播的工作中， GPT‑5 帮助搭建简化模型、做参数扫描并识别出一条最优性能的“脊线”，并提出了用能量平衡论证数值模式的物理解释与工程启发。尽管模型有时会给出不稳定的模拟或过分自信的结论，但在专家监督下可以快速纠正，从“某个区间出现奇怪现象”到“这是一个合理解释和具体检验”大大提速。

用 AI 获得的新科学成果（节选）

解决一个 Erdős 的数论问题： Sawhney 与 Sellke 在研究中请 GPT‑5 帮助分析一个“格格不入”数如何影响整组数的结构。 GPT‑5 提出了一种更清晰的证明思路，表明即便一个数不符合特定模式，也会对几乎所有其他数造成矛盾。这一关键想法补足了缺口，使他们完成了证明并证实了 Erdős 的猜想（即 Erdős Problem #848 已解）。
在线算法的新下界： Christian Coester 利用 GPT‑5 构思强难例，模型指出了一种特定的几何构造，经他修正与验证后，得到了比此前更干净更强的竞争比下界。
图论不等式的新证明：在计数树中小结构（路径、星型与“叉形”）的工作中，作者在围绕 GPT‑5 的定制数学支架下让模型先重证已知不等式，再挑战猜想的不等式。 GPT‑5 给出两条短小而自洽的证明，依赖于不同且更优雅的论证路径，作者随后核验并采纳了这些论证。
从网络快照中识别隐藏参数：在研究一个生长网络模型时，隐藏参数 www 控制新节点如何附着。 GPT‑5 建议关注一个可观测的统计量：长期看成为叶子的节点比例。模型阐明该叶子比例会收敛到一个关于 www 的简单且严格单调的函数，由此可从最终树的形状直接读取 www ，在此指导下作者给出了完整证明，证明该参数可被辨识。

局限性

这些案例是对 GPT‑5 有用场景的精选展示，并非系统性样本，也无法覆盖所有失败模式。专家监督仍然至关重要。 GPT‑5 有时会虚构引用、机制或证明，表面上看似合理；对“脚手架”和热身问题敏感；会遗漏领域特定的细节；如果不加纠正，可能沿着无效的思路继续下去。我们正与合作者一起研究并缓解这些问题，以完善未来系统。

展望

总体而言，这些早期研究表明， GPT‑5 已开始在某些类型的科学工作中发挥作用。模型并非自主，但在专家手中可助力定理证明、重建与扩展结构、发掘跨学科联系，并为科学家提出可验证的机制和实验。

我们也看到一个轨迹：随着更多时间与计算投入，这些系统会持续改进。如果 GPT‑5 在 20 分钟内就能在部分研究问题上提供有意义的帮助，模型能花数小时或数天深度推理时，很可能产生更深入的成果。结合世界级科学家，这指向了未来科学生产力可能出现的质变。

Science shapes everything from human health to energy production, from national security to our understanding of the universe. If AI can accelerate science—shortening the time it takes to generate new ideas, or to move from an idea to a tested result—the benefits compound across society.

But the pace of innovation remains a constraint. Even when the right idea exists, turning it into a product or treatment can take years. In a recent survey⁠, 60 percent of people in the U.S. said scientific and medical breakthroughs reach them too slowly; 73 percent said we need better ways to accelerate discovery; and 69 percent identified scientific leadership as a top national priority.

Today, we’re releasing “Early science acceleration experiments with GPT‑5⁠,” a paper co-authored with collaborators at universities and national laboratories including Vanderbilt, UC Berkeley, Columbia, Oxford, Cambridge, Lawrence Livermore National Laboratory, and The Jackson Laboratory. It compiles early case studies across math, physics, biology, computer science, astronomy, and materials science in which GPT‑5 helped researchers synthesize known results in a novel way, conduct powerful literature review, accelerate tough computations, and even generate novel proofs of unsolved propositions. The paper also documents limitations. Our goal is to give the community a clear view of what these systems can and cannot do today in research settings.

These case studies show how, in the hands of experts, GPT‑5 is accelerating scientific discovery, and why that acceleration matters:

Biology: In a study led by Derya Unutmaz, M.D., scientists spent months trying to explain a puzzling change in human immune cells. GPT‑5 identified the likely mechanism within minutes from an unpublished chart and suggested an experiment that proved it. This kind of speed could help researchers understand diseases faster and develop better treatments.
Mathematics: In another case, researchers Mehtaab Sawhney and Mark Sellke were tackling a decades-old open problem originally proposed by Paul Erdős. They were stuck on the final step, and GPT‑5 contributed a new idea about how one odd number breaks the pattern, which helped them complete the proof. Advances like this strengthen the mathematical foundations that many algorithms and security techniques ultimately rely on.
Algorithms & optimization: Researchers Sébastien Bubeck and Christian Coester were testing whether a common decision-making method used in robotics and routing was as reliable as people assumed. GPT‑5 found a new, clear example showing the method can fail and also improved a classic result in optimization, the math used to figure out the best way to solve a problem. This type of advance helps engineers better understand the decision-making systems used in robotics, routing, and other real-world applications.

What is OpenAI for Science?

The mission of OpenAI for Science is to accelerate scientific discovery: to help researchers explore more ideas, test hypotheses faster, and uncover insights that would otherwise take significant time. We do this by pairing frontier models with the right tools, workflows, and collaborations.

We work closely with researchers across academia, industry, and national labs. These collaborations help us understand where the models are useful, where they fail, and how to integrate them into the scientific process—from literature review and proof generation to modeling, simulation, and experimental design.

Our approach combines two complementary beliefs. Specialized scientific tools, such as simulation engines, protein databases, and computer algebra systems, are essential for efficiency and precision. At the same time, scaling foundation models continues to unlock new reasoning abilities: connecting ideas across fields, sketching proofs, proposing mechanisms, and navigating large literatures conceptually rather than by keyword. Where specialized tools exist, we want to use them; where general reasoning is required, we build models designed to handle it. Both paths reinforce each other.

How scientists are working with GPT‑5 today

The most meaningful progress comes from human–AI teams. Scientists set the agenda: they define questions, choose methods, critique ideas, and validate results. GPT‑5 contributes breadth, speed, and the ability to explore many directions in parallel.

Using GPT‑5 effectively is a skill. Researchers learn how to pose questions, when to push back, how to break problems into steps, and what to validate independently. Productive work often looks like dialogue—researcher and model iterating until a promising direction emerges or the idea is discarded

The current state of GPT‑5 in scientific work

Across these early studies, GPT‑5 appears able to shorten parts of the research workflow when used by experts. It does not run projects or solve scientific problems autonomously, but it can expand the surface area of exploration and help researchers move faster toward correct results.

One emerging capability is conceptual literature search. GPT‑5 can often identify deeper relationships between ideas and retrieve relevant material across languages and less accessible sources. Researchers report finding references, connections, and theses they did not previously know.
In mathematics and theoretical computer science, where structure is explicit and feedback loops are fast, GPT‑5 is especially helpful. Mathematicians have used GPT‑5 to generate viable proof outlines in minutes, transforming work that otherwise might have taken days or weeks. In physics and computational domains, the model can propose simplifying transformations or point to analogous structures in other fields.
In biology and other empirical sciences, the model can propose mechanisms and design experiments to validate these hypotheses in the wet lab.

We are beyond the point where models only summarize existing knowledge. Now, early contributions from GPT‑5 can meaningfully assist researchers under expert oversight. The pace of improvement suggests the potential for deeper acceleration as capabilities and tools advance.

What this looks like in practice: a few case studies

Independent rediscovery of known results at the scientific frontier

Tightening a theorem in convex optimization

Optimization is the math of finding the “best” option—like the lowest training loss or the shortest route in a network. Gradient descent is a basic optimization method that takes repeated small steps downhill on a function. A recent theorem⁠ by Guy Barzilai, Ohad Shamir, and Moslem Zamani asked when the sequence of values visited by gradient descent forms a convex curve over time (a curve with no dips), which makes the algorithm’s behavior easier to analyze and control. The first version of the paper showed this only for very small, conservative step sizes.

Sébastien Bubeck gave GPT‑5 the weaker version of the result and asked if the condition could be improved, and the model proposed a sharper step-size bound and a cleaner, more standard proof that he then checked carefully by hand; with more thinking time, an internal run of the model even derived the optimal bound from scratch.

GPT‑5’s contribution: GPT‑5 helped Sébastien Bubeck explore a sharper step-size condition and suggest a cleaner proof for a recent convex optimization theorem, which he verified independently.

Read more on page 3

Recovering hidden symmetries around black holes

In general relativity, rotating black holes are described by the Kerr solution, and waves moving around them satisfy a complicated differential equation. Physicists look for symmetries of such equations—transformations that leave them unchanged—because symmetries lead to conserved quantities and simple structure. Recent work by Alex Lupsasca showed that the Kerr wave equation has a hidden symmetry structure forming an SL(2,ℝ) algebra, which helps explain why certain tidal responses vanish.

When we asked GPT‑5 Pro directly about the full Kerr problem, it initially failed and reported no interesting symmetries. After Lupsasca gave it a simpler “warm-up” version of the same structure in flat space, we returned to the Kerr case; this time, after about 18 minutes of internal reasoning, the model produced the full set of symmetry generators that close into SL(2,ℝ), matching the human result.

GPT‑5’s contribution: GPT‑5 Pro reconstructed the hidden SL(2,ℝ) symmetry algebra of the Kerr black hole wave equation once given an appropriate warm-up problem, and Lupsasca confirmed the result.

Mechanistic insight in immunology

A key question in modern immunotherapy, especially in CAR-T cancer treatments that rely on engineered T cells, is how to keep beneficial T cells active and durable without pushing them into an exhausted, dysfunctional state. Established literature has shown that transiently limiting glucose metabolism can durably reprogram T cells to be more proinflammatory. In an earlier study, Derya Unutmaz and colleagues briefly treated human CD4+ T cells (a key class of immune cells) with 2-deoxyglucose (2DG), a compound that interferes with glucose metabolism. After removing 2DG and then priming the CD4+ T cells with IL-2 (a signaling molecule that tells T cells to proliferate), they saw a lasting shift toward a proinflammatory Th17-like state—a subtype of T cells involved in both protection and autoimmune disease—and spent months of experiments and reading to arrive at a plausible mechanism explaining this effect.

Years later, he gave GPT‑5 Pro an unpublished figure of flow cytometry scatterplots showing different T cell subsets after treatment with varying glucose and 2DG levels—and asked what might explain the data and what experiments to run next. In about a dozen minutes of back-and-forth, the model suggested that disrupted N-linked glycosylation (how cells attach sugar chains to proteins) during priming was the driver, and predicted that memory (rather than naïve) T cells were responsible. GPT‑5 then proposed specific follow-up experiments, including an elegant mannose rescue experiment that restored N-glycosylation without restoring glycolysis. The lab had previously conducted the mannose rescue experiment, and the results matched the model predictions exactly.

GPT‑5 Pro was then able to analyze unpublished data of CD8+ T cells pulsed with 2DG, and predicted that transient 2DG exposure during CAR-T generation would lead to enhanced killing efficiency against target cancer cell lines. GPT‑5 Pro’s predictions matched the lab’s unpublished experimental data.

GPT‑5’s contribution: GPT‑5 analyzed unpublished data to derive non-obvious and valuable mechanistic hypotheses, identified the acting T-cell subpopulation, and suggested follow-up experiments, which Unutmaz’s lab later tested and confirmed.

Read more on page 11

Deep literature search

Linking a new geometric result to other fields

Nikita Zhivotovskiy and his collaborators proved a new theorem in convex geometry—the study of “well-behaved” shapes where any line between two points stays inside the shape. Convex geometry underlies many models in machine learning and statistics. Once the theorem was done, the natural next question was: where else could this result be useful?

Instead of guessing search terms and scanning the literature by hand, Zhivotovskiy gave GPT‑5 the formal statement of the theorem and asked which areas it might connect to. The model pointed to work in density estimation, learning theory, and multi-objective optimization, and surfaced specific references, including several he had not seen and some in other languages.

GPT‑5’s contribution: GPT‑5 helped Nikita Zhivotovskiy identify concrete connections and references across several fields, including materials he had not encountered.

Cleaning up—and contributing to—the Erdős problem database

Paul Erdős posed more than a thousand problems, many of which are tracked on a public website. Some problems are still listed as “open” even though solutions exist in obscure journals or non-English papers. Mehtaab Sawhney and Mark Sellke used GPT‑5 as a literature-search assistant over this database: for each supposedly open problem, they asked it to search for solutions or major partial progress.

GPT‑5 found full solutions for several problems still marked open, identified substantial partial results for others, and flagged a misprint in one problem statement. For Erdős Problem #848, human comments on the site had already outlined much of the structure; GPT‑5 proposed a key density estimate, and Sawhney and Sellke corrected and tightened it into a complete proof that closed the problem.

GPT‑5’s contribution: GPT‑5 assisted in locating missed solutions and proposed a density estimate that Sawhney and Sellke refined into a complete proof of Erdős Problem #848.

Clique-avoiding codes: a cautionary tale

Error-correcting codes add redundancy to data so you can recover information even when bits are corrupted. This project examined a special kind of binary code where each position corresponds to an edge in a graph, and the goal is to rule out any codeword that looks like a “clique” (a fully connected set of nodes). The challenge was to determine how many parity checks are fundamentally required to prevent these structured errors. GPT‑5 reframed the question using quadratic equations over a finite field and highlighted a classical result, the Chevalley–Warning theorem, which immediately pointed to the correct lower bound—showing that only about half as many constraints were needed as previously thought.

An unexpected twist emerged afterward: the exact same bound, and essentially the same proof, had appeared years earlier in a short research paper. GPT‑5 had reproduced the argument without citing its source, only identifying the prior work when asked again in a fresh session. This underscored an important lesson for AI-assisted mathematics: models can generate correct and elegant reasoning, but they may not reliably attribute where those ideas originally came from. Careful verification and attention to attribution remain essential.

GPT‑5’s contribution: GPT‑5 provided the key reformulation and the classical theorem that led to the optimal lower bound. However, the model did not identify the prior publication until explicitly asked, underscoring the need for careful human checks on attribution.

Read more on page 28

Working in tandem with AI

Using GPT-5 as a research partner in combinatorics

Tim Gowers, a Fields Medal–winning combinatorialist, ran a series of experiments treating GPT‑5 as a “research partner” rather than a tool for homework-style problems. He gave the model hard combinatorics questions he was actively thinking about and asked it to suggest constructions, find counterexamples, or critique partial arguments.

In multiple cases, GPT‑5 quickly spotted flaws or missing cases in candidate constructions and proposed simpler alternatives or counterexamples; in others, it stalled or failed to make progress. Gowers’ overall conclusion was that the model is already useful as a very fast, very knowledgeable critic that can stress-test ideas and save time, even though it does not yet meet his bar for full co-authorship.

GPT‑5’s contribution: GPT‑5 acted as a fast critic for Tim Gowers, spotting flaws, missing cases, and simpler alternatives during exploratory combinatorics work.

Read more on page 31

Interpreting cosmology models

Cosmology uses simplified models to describe the large-scale behavior of the universe, including dark energy and the expansion history. Those models often exist in several mathematically equivalent forms, and small algebraic slips can derail a calculation. Robert Scherrer used GPT‑5 to sanity-check derivations, explore toy versions of cosmological models, and translate between different parameterizations of dark energy.

GPT‑5 was particularly useful in catching algebraic mistakes, suggesting equivalent formulations of the same physical idea, and pointing Scherrer to existing results in the literature that matched the models he was independently deriving. This reduced the friction between having an idea on paper and getting it into a form that could be compared with data.

GPT‑5’s contribution: GPT‑5 assisted Robert Scherrer by checking derivations, suggesting equivalent formulations, and pointing to matching results in the literature.

Read more on page 37

Supporting fusion and plasma physics

Fusion and plasma physics involve modeling hot, dense plasmas where small changes in initial conditions can lead to very different behavior. Running and interpreting these simulations is expensive and time-consuming. This work used GPT‑5 to help build and analyze a simplified reaction–diffusion model of thermonuclear burn propagation, interpret complex outputs, and explore how different density profiles affect burn performance. The model helped run parameter sweeps and identify a ridge of optimal profiles where burn fronts travel fastest.

GPT‑5 also assisted in proposing a theoretical explanation for these numerical patterns, using power-balance arguments to explain why certain profiles perform better and suggesting simple engineering rules to guide future designs. While the model occasionally produced unstable simulations or overly confident conclusions, expert oversight allowed rapid correction, making it much faster to move from “something strange is happening in this regime” to “here is a plausible explanation and a concrete test.”

GPT‑5’s contribution: GPT‑5 helped construct the reduced-physics model, explore parameter space, and propose physical explanations.

Read more on page 40

New scientific results obtained with AI

Solving an Erdős number-theory problem

Paul Erdős posed a problem about finding the largest set of positive integers with a surprising rule: for any two numbers in the set, the product of those two numbers plus one must always be divisible by a perfect-square prime factor. Erdős guessed what the largest such set should look like, but the problem remained open for decades.

Sawhney and Sellke explored the structure of the problem and then asked GPT‑5 to help analyze how a single “out-of-place” number would affect the entire set. GPT‑5 suggested a clearer way to show that if even one number does not fit a specific pattern, it forces contradictions across almost all other numbers. That idea turned out to be the missing step. With it, the researchers completed a full proof showing that Erdős’s original guess was correct.

GPT‑5’s contribution: GPT‑5 surfaced the key insight about how one number constrains all others, enabling the authors to finish the proof of Erdős Problem 848.

New lower bounds for online algorithms

Online algorithms make decisions step by step without knowing the future—for example, deciding how to move a system when constraints are revealed over time. In the convex body chasing problem, the algorithm must stay inside a moving convex region while keeping total movement small. A central question is the best possible competitive ratio: how much worse an online algorithm can be, in the worst case, than an ideal offline algorithm that sees the whole sequence in advance.

Christian Coester used GPT‑5 to brainstorm hard instances and constructions that might force any online algorithm to perform badly. The model highlighted a particular geometric construction that, after refinement and checking by Coester, led to a cleaner and stronger lower bound on the competitive ratio than was previously known.

GPT‑5’s contribution: GPT‑5 suggested a geometric construction that Christian Coester refined into a stronger lower bound for an online algorithm problem.

Read more on page 61

Proving new inequalities in graph theory

The team studied a graph theory problem about counting small patterns—paths, stars, and “wyes”—inside trees (graphs with no cycles). Previous work had proved one inequality relating these counts and conjectured a second one, which remained unproved. Using a custom math scaffolding around GPT‑5, the authors first asked the model to re-prove the known inequality, then to attack the conjectured one.

GPT‑5 produced short, self-contained proofs of both inequalities, relying on a different and more elegant argument than the original human proof; Bubeck, Sellke, and Yin then checked and adopted the model’s argument in their write-up.

GPT‑5’s contribution: GPT‑5 generated short proofs of two inequalities in trees, including a conjectured one, and the authors independently checked and adopted the argument.

Read more on page 69

Identifying hidden parameters in evolving networks

Researchers studied a simple model of a growing network where each new node attaches to earlier nodes with a probability influenced by a hidden parameter www. The challenge is that once the network is grown, you only see the final unlabeled tree—not the hidden labels or the attachment rules that produced it. The open question was whether www could be recovered at all from just this final structure.

The team asked GPT‑5 to reason about what global patterns in the final tree might reliably reflect the value of www. The model proposed focusing on a surprisingly accessible statistic: the long-run fraction of nodes that end up as leaves. GPT‑5 outlined how this leaf fraction converges to a simple, strictly increasing function of www, meaning that www can be read off directly from the shape of the tree. With this guidance, the authors produced a complete proof showing that the parameter is indeed identifiable.

GPT‑5’s contribution: GPT‑5 highlighted the key observable—leaf fraction—that unlocks a clean, provable method for recovering the hidden parameter www from a single snapshot of the final network.

Read more on page 75

Limitations

These case studies are curated illustrations of where GPT‑5 has been useful; they are not a systematic sample, and they do not capture the full range of failure modes. Expert oversight remains essential. GPT‑5 can sometimes hallucinate citations, mechanisms, or proofs that appear plausible; it can be sensitive to scaffolding and warm-up problems; it sometimes misses domain-specific subtleties; and it can follow unproductive lines of reasoning if not corrected. These are active areas of research, and we are working with collaborators to measure and mitigate these failures as we refine future systems.

What’s next

Taken together, these early studies show that GPT‑5 is beginning to help with new types of scientific work. The model is not autonomous, but in expert hands it can help prove theorems, rediscover and extend structures, surface cross-field connections, and generate mechanisms and experiments for scientists to validate.

We also see a trajectory in which these systems improve with more time and compute. If GPT‑5 can meaningfully assist with some research questions in 20 minutes, we expect deeper results when models can spend hours or days reasoning about a problem. Combined with world-class scientists, this points toward the possibility of a step-change in scientific productivity over time.

Generated by RSStT. The copyright belongs to the original author.

Source