Measuring AI’s capability to accelerate biological research

Measuring AI’s capability to accelerate biological research

OpenAI News

加速科学进展是人工智能带给人类的最重要价值之一。随着 GPT‑5 的出现,我们已看到早期迹象——不仅能帮助研究者更快地梳理文献,还能支持新的科学推理形式,例如发掘意外关联、提出证明策略或给出专家可评估和检验的可行机制。

到目前为止,最明显的进展出现在数学、理论物理和理论计算机科学等可以在不借助物理实验的情况下严格检验思想的领域。生物学则不同:大多数突破依赖于试验操作、反复迭代和实验室中的实证验证。

为评估前沿模型在此类“湿实验”场景中的表现,我们与生物安全初创公司 Red Queen Bio 合作,搭建了一套评价框架,测试模型在湿实验中如何提出、分析并迭代实验思路。我们设定了一个简单的分子生物学实验体系,让 GPT‑5 优化一种分子克隆(cloning)流程以提高效率。

在多轮实验中, GPT‑5 提出并改进流程,最终引入了一种新机制,使克隆效率提升了 79 倍。克隆是分子生物学的基础工具,其效率对构建用于蛋白工程、大规模基因筛选和菌株工程的复杂文库至关重要。本项目展示了 AI 与生物学家并肩加速研究的潜力:改进实验方法能让人类研究者跑得更快、降低成本,并将发现更迅速地转化为现实影响。

鉴于生物推理能力的进步可能带来生物安全风险,本次工作在严格受控的条件下完成——使用良性实验体系、限定任务范围,并评估模型行为以支撑我们的生物安全风险评估和模型/系统层面的防护建设(参见 Preparedness Framework )。

实验结果概要

在该体系中, GPT‑5 能自主推理克隆流程、提出修改方案,并把新增实验数据纳入后续建议。唯一的人工干预是由实验室科学家按模型建议执行修改后的操作并上传结果数据。

经过多轮迭代, GPT‑5 将克隆流程优化,使在相同输入 DNA 情况下回收的序列验证阳性克隆数量比基线协议多出 79 倍。最关键的是,模型引入了两种酶,形成了一种新机制:来自大肠杆菌的重组酶 RecA 和噬菌体 T4 的单链 DNA 结合蛋白 gp32 。二者协同工作: gp32 平顺并理顺游离的 DNA 端, RecA 则引导每条链找到其正确配对。

初筛与后续实验确定,酶学组装方案命名为 RecA-Assisted Pair-and-Finish HiFi Assembly (简称 RAPF-HiFi ),而最佳转化(transformation)方案为 Transformation 7 (简称 T7 )。单独比较, RAPF-HiFi 相较基线 HiFi 反应提高了约 2.6 倍, T7 转化提高约 36 倍;二者叠加后最终带来约 79 倍的综合提升。所有克隆均经测序确认(误差线:独立验证实验 n=3 的标准差)。

尽管结果尚属早期,但颇有启发性。这里的改进针对的是我们所用的特定克隆体系,且仍需人工科学家准备并执行实验。即便如此,这些实验证明 AI 系统能在真实实验室工作中发挥实质性辅助,并可能在未来加速人类科研进程。

值得注意的是,AI—实验循环是在固定提示(fixed prompting)且无人工干预的条件下运行的。这种约束有助于暴露模型在无外部人类引导下提出真正新颖方案的能力,但也限制了系统对新发现思路的进一步最大化利用。若能在探索(exploration)与利用(exploitation)之间找到更好的动态平衡,预计会获得更大增益;无论是酶学路线还是转化步骤,都有显著的优化空间。我们预计规划能力与任务视野(task-horizon)推理的进步,将提升简单固定提示在发现与后续优化两方面的支持能力。

用于优化真实流程的进化框架

自 2009 年发明以来, Gibson assembly 已成为主要的克隆方法之一。 Gibson assembly 通过短暂“熔解”DNA末端,使配对序列能互相配对并接合,所有步骤在同一管、恒定温度下完成,其简便性是主要优势。正是这种单步单温的范式留下了改进空间;此外,该方法具备几个利于评估 AI 优化湿实验能力的特点:

  • 组成可控、定义明确,不像细胞体系那样复杂;
  • 优化目标清晰:从固定量线性 DNA 输入获得可转化的环状 DNA;
  • 实验周期较短(1–2 天);
  • 设计空间维度高,需机制性推理来优化:最佳缓冲液、试剂与温度相互依赖。

我们以 New England Biolabs 的专利体系 HiFi assembly (基于 Gibson)作为起点,探索在解除单步和恒温限制后,AI 是否能在实验反馈中创新并学习,从而识别出改良方案。

实验使用双片段克隆:一个绿色荧光蛋白基因( GFP )和常用载体 pUC19 。目标是增加成功落板菌落数。我们引入了一个“进化”框架,让模型对建议进行迭代并“在线”学习:每轮 GPT‑5 提出 8–10 个不同反应方案;需要定制试剂的方案会推至后轮。科学家按建议实施反应并在初筛中以基线 HiFi Gibson 组装作对照测定菌落数;上一轮表现最好的数据会被纳入下一轮提示。提示语标准化且无其他人工输入,这使我们能将新机制洞见直接归因于 AI 而非人工引导。

我们对全套优化过程中的前八名反应进行更宽范围 DNA 稀释的复测,发现许多样本在复测中效果较初筛时减弱;最终,经过验证效果最稳健的是第 5 轮的一项反应,它再现了最初的表现。许多高表现方案属于“ligase-polish”家族,这类方法对感受态细胞状态或反应后 DNA 操作极为敏感。因其短暂的 HiFi 步骤,产物进入大肠杆菌时可能只有一个接合点已被封闭,另一个仅靠退火保持,后续需依赖细胞修复途径,这导致高方差和“头奖”(jackpot)效应:虽然多数情况下这些变体并不总是优于基线,但单一强烈的离群值即可把这一类别带入下一轮。

在我们着重对克隆酶学反应进行多轮优化的同时,转化步骤采用一次性(one-shot)筛选:模型提出多种独立变化,我们选取表现最佳的方案。

在两步克隆工作流的初筛中,酶学组装经过五轮迭代(共 44 个反应)得到优化;转化条件则在一次性测试中测试了 13 种不同方案。所有初筛数据均为单次测量(n=1)——顶级候选随后单独做了重复验证。

基于标准化提示且无人工介入, GPT‑5 将端到端克隆效率提高了 79 倍,并在实验重复中得到确认。

一种新颖的同源性(homology)克隆改进

模型提出的新酶程式被命名为 RecA-Assisted Pair-and-Finish HiFi Assembly( RAPF-HiFi )。该方案在传统 HiFi 反应结束后引入两种蛋白:大肠杆菌的重组酶 RecA 与噬菌体 T4 的单链 DNA 结合蛋白 gp32 。模型还有意调整了孵育温度、时长以及酶的加入时序:先做 50°C 的 HiFi 反应,随后降至 37°C 加入 RecA 和 gp32 ,让这两种蛋白发挥作用,再升回 50°C 完成组装。上述这些改动单独就将效率提升了约 2.5 倍(此为未经进一步迭代优化的初始表现)。

关于作用机理,我们的假设如下:

  • T5 外切酶(exonuclease)产生 3′ 伸展的单链末端, gp32 通过抑制二级结构来稳定 ssDNA;
  • RecA 从 3′ 端侵入,取代 gp32 串并促进同源性搜寻和退火;
  • 回升到 50°C 时, RecA 与 gp32 均从 DNA 上解离,随后聚合酶(polymerase)填补空隙并由连接酶(ligase)封闭链间连接。

为排除仅靠温度或缓冲液变化即可解释性能提升的可能性,我们做了移除 RecA 、或同时移除 RecA 与 gp32 的对照实验。两种对照反应的性能均低于完整的 RAPF-HiFi,表明这两种蛋白对该方案的作用是必要的。单独加入 RecA 或单独加入 gp32 都不如二者合用;二者合用较基线 HiFi 提升约 2.6 倍(误差线:独立实验 n=3 的标准差)。

学术背景上, RecA 与 gp32 的相互作用在体外机制学重构实验中已有研究,且 gp32 在某些体系中可增强 RecA 活性; gp32 也曾作为 PCR 等扩增反应的添加剂以降低二级结构干扰。但据我们所知, RecA 与 gp32 作为一套可泛化应用的分子生物学方法并未被系统性采用:相关生化活性已被研究,但 GPT‑5 将它们以这种不常见的组合与反应动力学一并整合,构成了一条机械学上不同的组装路径,这一点值得注意。

在转化环节,最有效的修改出人意料地简单:将感受态细胞离心沉降、去除一半上清体积并重悬后再加 DNA,整个过程在 4°C 操作。尽管高效化学感受态细胞通常被认为较为脆弱而避免此类处理,但细胞对浓缩操作表现出耐受性;每个细胞接触到的 DNA 增多、抑制性缓冲减少以及更为尖锐的热激使转化效率大幅上升(最终验证中 >30 倍)。

该转化方法在概念上与已有一些将细胞提前浓缩的报道类似,但本次由 GPT‑5 提出的变体可直接兼容商业化现成的化学感受态细胞,无需自制细胞,同时在可比菌株上超过了类似方法报道的效率提升。

机器人体系

为提高本实验体系的通量, Robot on Rails 与 Red Queen Bio 合作,构建了一套能够将自然语言克隆协议转化为湿实验动作并执行的机器人系统。该系统由三部分组成:1)将自然语言指令转为机器人动作的“人到机”大模型(human-to-robot LLM);2)实时识别并定位实验器材的视觉系统;3)规划安全、准确执行动作的机器人路径规划器。基于此,我们优化了用于 Gibson 克隆变体的通用实验机器人。

我们测试了机器人能否独立完成完整的克隆实验,同时并行运行基线 HiFi 方法与第一轮优化中表现最佳的 R8 方案。机器人在转化步骤上成功处理了多种物理操作:液体转移与混匀、样品管搬运、控制热处理以及铺板等。与人工执行的转化直接比较时,机器人给出相似的相对改进,显示出自动化在优化生物实验方面的早期潜力。

尽管机器人与人工执行所得的折叠变化(fold-change)相近,但机器人产出的绝对菌落数约为人工操作的十分之一,提示液体处理精度、温控校准以及复制人工细致操作(尤其是细胞处理)等方面仍需改进。以归一化到各自 HiFi 基线的结果来看,人工执行的 R8 方案提升了 2.39 倍;机器人执行的 R8 提升了 2.13 倍(约为人工的 89%),证明尽管绝对产率偏低,协议排序与效果一致性较好。

展望

我们认为这些实验证明了未来 AI 加速科学的一个侧影:模型能持续学习并与现实世界交互。尽管本次实验刻意排除了人为干预以纯粹评估模型能力,但我们对 AI 在辅助人类科学家设计实验并促成研究突破的前景尤感期待。

与此同时,在努力安全、负责地加速科学进展的路上,我们也在评估并致力于降低相关风险,尤其是生物安全相关风险。本次评估结果表明,模型确实能在湿实验中进行推理并提出改良方案,这对生物安全有潜在影响(详见 Preparedness Framework )。我们承诺在模型和系统层面构建必要且精细的防护措施以降低这些风险,并持续开发评估方法以追踪当前能力水平。



Accelerating scientific progress is one of the most valuable ways AI can benefit humanity. With GPT‑5, we’re beginning to see early signs of this—not only in helping researchers move faster through the scientific literature, but also in supporting new forms of scientific reasoning, such as surfacing unexpected connections, proposing proof strategies, or suggesting plausible mechanisms that experts can evaluate and test.


Progress to date has been most visible in fields like mathematics, theoretical physics, and theoretical computer science, where ideas can be rigorously checked without physical experiments. Biology is different: most advances depend on experimental execution, iteration, and empirical validation in the laboratory.


To help understand how frontier models behave in these settings, we worked with Red Queen Bio, a biosecurity start-up, to build an evaluation framework that tests how a model proposes, analyzes, and iterates on ideas in the wet lab. We set up a simple molecular biology experimental system and had GPT‑5 optimize a molecular cloning protocol for efficiency.


Over multiple rounds of experimentation, GPT‑5 introduced a novel mechanism that improved cloning efficiency by 79x. Cloning is a fundamental molecular biology tool. The efficiency of cloning methods is critical for creating large, complex libraries central to protein engineering, genetic screens, and organismal strain engineering. This project offers a glimpse of how AI could work side-by-side with biologists to speed up research. Improving experimental methods will help human researchers move faster, reduce costs, and translate discoveries into real-world impact.


Because advances in biological reasoning carry biosecurity implications, we conducted this work in a tightly controlled setting—using a benign experimental system, limiting the scope of the task, and evaluating model behavior to inform our biosecurity risk assessments and the development of model- and system-level safeguards, as outlined in our Preparedness Framework.


Experimental results




In this set-up, GPT‑5 autonomously reasoned about the cloning protocol, proposed modifications, and incorporated data from new experiments to suggest more improvements. The only human intervention was having scientists carry out the modified protocol and upload experimental data.


Over the course of multiple rounds, GPT‑5 optimized the cloning procedure to improve the efficiency by over 79x—meaning that for a fixed amount of input DNA, we recovered 79x more sequence-verified clones than the baseline protocol. Most notably, it introduced two enzymes that constitute a novel mechanism: the recombinase RecA from E. coli, and phage T4 gene 32 single-stranded DNA–binding protein (gp32). Working in tandem, gp32 smooths and detangles the loose DNA ends, and RecA then guides each strand to its correct match.




Initial screening and secondary experiments identified RecA-Assisted Pair-and-Finish HiFi Assembly (RAPF) and Transformation 7 (T7) as the top enzymatic and transformation protocols, respectively. Both RAPF assembly and T7 transformation independently improved cloning efficiency relative to the base HiFi reaction cloning protocol, 2.6-fold and 36-fold respectively; and combined to provide an additive improvement in performance of 79-fold. All clones were confirmed by sequencing. (Error bars: SD of n=3 independent validation experiments).



While early, these results are encouraging. The improvements are specific to our particular cloning set up used in our model system, and still require human scientists to set up and run the protocols. Even so, these experiments show that AI systems can meaningfully assist real laboratory work and may accelerate human scientists in the future.


Notably, the AI-lab loop was run with fixed prompting and no human intervention. This scaffolding helped reveal the model’s capacity to propose genuinely novel protocol changes independent of human guidance, but it also locked the system into exploration and limited its ability to maximize the performance of newly discovered ideas. A better dynamic balance between exploration and exploitation would likely yield larger gains, as both the enzymatic and transformation improvements have substantial room for refinement. We expect advances in planning and task-horizon reasoning to improve the ability of simple fixed prompts to support both discovery and subsequent optimization.


An evolutionary framework for optimizing real-world protocols




The Gibson assembly reaction has been a primary cloning method since its invention in 2009, with widespread adoption across molecular biology. Gibson assembly lets molecular biologists “glue” pieces of DNA together by briefly melting their ends so matching sequences can be sealed into a single molecule. One major appeal of Gibson assembly is its simplicity: everything happens in a single tube at one temperature. Those constraints naturally leave room for improvement. In addition, the following properties make it well-suited to evaluating AI models’ abilities to improve wet lab techniques:


  • Well-defined with controlled components, unlike a cell-based system
  • Has a clear optimization function: transformable circularized DNA made from a fixed amount of linear DNA inputs
  • Relatively fast experimental cycles (1-2 days)
  • High-dimensional design space that requires mechanistic reasoning to improve: optimal buffers, reagents, and temperatures are all interdependent

We used HiFi assembly, a proprietary enzyme system developed by New England Biolabs and based on Gibson assembly, as an optimization starting point. We explored whether an AI could innovate and learn from experimental feedback once the single-step and isothermal constraints were removed, and thereby identify protocol improvements in this scenario.

Specifically, we performed a two-piece cloning reaction using a gene for green fluorescent protein (GFP) and the widely used pUC19 plasmid, a standard DNA “vehicle” used to carry genes into bacteria so they can be copied. The goal was to increase the number of successful colonies.


We optimized the cloning reaction by introducing an evolutionary framework for iterating on proposals, enabling the model to learn “online” from its past experiments. In each round, GPT‑5 proposed a batch of 8-10 different reactions, with reactions pushed to later rounds if they required custom reagents the laboratory did not have readily on hand. Human scientists then carried out the reactions and measured the colony counts relative to the baseline HiFi Gibson assembly in an initial screen. The best performing data from the previous round were then fed into the next round. Importantly, the prompting was standardized with no human input beyond clarifying questions, allowing us to attribute novel mechanistic insights directly to the AI rather than human guidance. 


We retested the top eight reactions from the full optimization series using a wider range of DNA dilutions, and found that many showed smaller effects than in the initial screen; ultimately, the strongest validated candidate was a reaction from round-5 that reproduced its original performance. Many high performers fell into the ligase-polish family, which appears particularly sensitive to small variations in competent-cell state and/or post-reaction DNA handling. Because these reactions used a short HiFi step, we hypothesize that many products likely enter E. coli with only one junction sealed and the other held by annealing, leaving downstream rescue to cellular repair pathways. This creates high variance and a ‘jackpot’ dynamic: even if most of the time variants of this reaction don’t outperform, a single strong outlier can carry the family into subsequent rounds. 


While we focused on optimizing the cloning reaction over rounds due to its mechanistic complexity, we in parallel optimized the transformation procedure using a single “one-shot” round where the model proposed many independent changes, and we took the top performing reaction.





Initial optimization screens of the two-step cloning workflow: enzymatic assembly and transformation. (Left) Iterative optimization of enzymatic assembly over five rounds (44 reactions total). Starting from the HiFi assembly baseline, GPT‑5 proposed 8-10 assembly protocol variants per round; data of top-performing results were incorporated into subsequent prompts. At each round, we plot the top-performing reaction so far (including previous rounds). (Right) One-shot optimization of transformation conditions testing 13 different protocols. For both optimization screens, data represents single measurements (n=1) per condition; replicated validation was performed separately for top candidates.



Using standardized prompts with no human input, GPT5 improved end-to-end cloning efficiency 79-fold, confirmed across experimental replicates.


Notably, the model proposed a new enzymatic procedure, which the model called RecA-Assisted Pair-and-Finish HiFi Assembly (RAPF-HiFi), that adds two new proteins to the reaction: the recombinase RecA from E. coli, and the phage T4 gene 32 single-stranded DNA–binding protein (gp32). Further, the model made deliberate modifications to the incubation temperature and time, and the timing of enzymatic additions: it proposed adding RecA and gp32 after an initial 50°C HiFi reaction, letting these proteins work at 37°C, and then going back to 50°C to complete the assembly. Together, these new modifications boosted efficiency over 2.5-fold. It should be noted that this represents the initial performance without iterative optimization of reaction conditions and timing.



Baseline HiFiRound 1Round 2Round 3Round 4Round 5Baseline transformation protocolHiFi + T7 transformationCombined final protocol



20 uL reaction volume
100 ng pUC19 vector, HindIII/KpnI-digested
64.3 ng Monster GFP insert, DpnI-digested PCR amplicon
10 μL NEBuilder 2x HiFi DNA Assembly Master Mix
50C - 30 min; 4C hold




















On the transformation side, the most effective modification proved unexpectedly simple: pelleting the cells (spinning them down in a centrifuge so they collect at the bottom of the tube), removing half of the supplied volume, and resuspending the cells before adding DNA, all at 4°C. While high-efficiency chemically competent cells are typically considered fragile, the cells tolerated concentration well and the increased molecular collisions boosted transformation efficiency substantially (>30-fold on final validation). 


A novel improvement to homology-based cloning







T5 exonuclease creates 3′ overhangs that gp32 stabilizes by suppressing secondary structure. RecA then invades from the 3′ ends, displacing gp32 and promoting homology search and annealing. Heating to 50 °C removes both proteins, enabling polymerase gap fill and ligation.











Gibson assembly works by giving pieces of DNA matching “sticky” ends so that they can find each other and join. The reaction uses two different enzymes (a polymerase and a ligase) to seal the joined pieces. In RAPF-HiFi, two proteins were introduced to make the matching step work better. The first, gp32, acts like a comb that smooths and untangles the loose DNA ends. The second, RecA, acts like a guide that searches for the correct partner for each strand and pulls the matching pieces together. Higher temperature causes both helpers to fall off the DNA, allowing the normal Gibson enzymes to complete the reaction.


In summary, we hypothesize that the improved performance is mediated via the following mechanism:


  • Gp32 coats non-annealed single-stranded DNA (ssDNA) tails, removing secondary structure
  • RecA, normally inhibited by structure, invades from the 3’ and displaces the gp32 filament
  • RecA mediates a ssDNA:ssDNA homology search, driving annealing
  • A return to 50°C displaces both the recA and the gp32 filaments, allowing polymerase and ligase to complete the reaction.

To test whether the novel enzymes were functional, and to rule out that the performance improvement is driven solely by changes in thermal steps or buffers, we tested the performance of RAPF-HiFi without RecA, and without both RecA and gp32. The performance of both reactions was reduced relative to RAPF-HiFi, suggesting that both proteins are necessary for the mechanism of action of RAPF-HiFi.




To test the underlying mechanism, we separate out the two new enzymes in the reaction: RecA and gp32. We show that either of these alone reduces the efficiency relative to the HiFi baseline. Together, they outperform the baseline with a 2.6x efficiency gain. (Error bars: SD of n=3 independent experiments)



The development RAPF-HiFi suggests that GPT‑5 is capable of complex, multi-dimensional reasoning:



To our knowledge, RecA and gp32 have not been functionally co-used in molecular biology methods. As with many novel molecular biology techniques, the underlying biochemical activities were already studied, but their use as a practical, generalizable method constitutes the advance.


For example, the interaction of RecA and gp32 has been studied in mechanistic in vitro reconstitution assays: in studies of D loop formation, gp32 was shown to be capable of enhancing RecA activity. Gp32 has been used in conjunction with its natural T4 recombinase partner UvsX and recombinase loading factor uvsY in recombinase polymerase amplification (RPA). Although an RPA patent specification states that effective RPA reactions have been demonstrated using E. coli RecA in a heterologous system with a compromised (i.e., engineered, non–wild-type) gp32 protein, this assertion appears only as a tangent in some patent disclosures and, to our knowledge, has not been supported by published data or adopted as a robust RecA-based RPA system. One cloning method called SLiCE uses a whole cell extract from E. coli containing the λ Red recombination system, where Red beta may perform dual roles as both a DNA-binding protein and recombinase (though we explicitly prohibited the use of cell extracts in our prompt). In a different application, Ferrin & Camerini-Otero used RecA alone to selectively capture DNA molecules based on matching sequences. Separately, gp32 has been used as an additive in a DNA amplification process called PCR to reduce secondary structure. NABSA amplification was shown to be enhanced by both RecA and gp32, though each could enhance the reaction separately and no synergy was identified. More broadly, reported improvements to the basic Gibson-style DNA assembly reactions have been scarce, with the most notable example being a heat-stable DNA-binding protein (ET SSB) that improves assembly efficiency by approximately 2.5-fold


For most applications, we do not expect RAPF-HiFi to compete with the simplicity and robustness of HiFi/Gibson cloning. However, the emergence of a mechanistically distinct assembly pathway is noteworthy: GPT‑5 arrived at a solution that incorporates an unfamiliar combination of recombination proteins and reaction dynamics. The underlying mechanism may prove modular, providing components that can be repurposed or recombined in other molecular workflows. We are also continuing to explore improvements to RAPF-HiFi. Reaction temperatures and step durations can be tuned to balance RecA and gp32 activity against exonuclease over-digestion, and the amounts of both proteins remain to be optimized. GPT‑5 has also proposed a hyperactive RecA variant, which we are currently purifying.


With respect to the transformation protocol, the successful optimization conditions spanned a range of additives and thermal perturbations intended to enhance the heat-shock efficiency of commercial 10-beta competent cells. Of the 13 AI-generated one-shot transformations tested, the most effective modification, Transformation 7 (T7), pelleted the cells, removing half of the supplied volume, and resuspending the cells before adding DNA, all at 4°C. High-efficiency chemically competent cells are typically considered fragile, and such handling steps are generally avoided. Nonetheless, the cells tolerated concentration well. The combined effects of increased DNA exposure per cell and less inhibitory buffer leading to a sharper heat-shock yielded a substantial increase in transformation efficiency (>30-fold). 


This transformation protocol is novel, although a conceptually similar approach where the cells are concentrated at an earlier step has been reported. Notably, the method developed here by GPT‑5 is compatible with off-the-shelf chemically competent cells, eliminating the need for in-house cell preparation, while exceeding the similar approach’s reported efficiency gains on comparable cell strains.


Robotic system




To increase the throughput of this model experimental system, Robot on Rails and Red Queen Bio collaborated to build a robotic system that takes in a natural language cloning protocol and executes it in the wet lab.


The system combines three components: 1) a human-to-robot LLM that converts plain English into the robot’s actions; 2) a vision system that identifies and localizes labware in real time; and 3) a robotic path planner that determines how to carry out each action safely and accurately. The result is a flexible, generalized lab robot that was further optimized for variants of the Gibson cloning protocol.










We tested whether the autonomous robot could execute a complete cloning experiment by running two protocols simultaneously: the standard HiFi method and R8, the top-performing AI-modified protocol from the first optimization round.

We compared the robot’s work to human-performed experiments at each step. The robot successfully handled the transformation process, which required diverse physical operations: transferring and mixing liquids, moving sample tubes, applying controlled heat to cells, and spreading cells onto growth plates. When compared directly with human-performed transformations, the robot generated similar quality data with equivalent improvements over baseline, showing early potential for automating and accelerating biological experiment optimization.


While the fold-changes between the robot and human experiments were similar, absolute colony counts from the robot were approximately ten-fold lower than manual execution, indicating areas for improvement such as liquid handling precision, temperature control calibration, and replicating the nuances of manual cell handling techniques.




Both the standard HiFi method (baseline) and the improved R8 method were executed by human researchers and the autonomous robot, with transformation efficiencies normalized to respective HiFi baseline controls (set to 1.0). Human-executed R8 showed 2.39-fold improvement; robot-executed R8 achieved 2.13-fold improvement (89% of human performance), demonstrating comparable protocol ranking despite lower absolute yields.



The future




We believe that these experiments offer a snapshot of what future AI-accelerated science will look like: models continually learning and interacting with the real world. Although our experiments excluded human intervention to purely measure model capabilities, we’re particularly excited about AI helping human scientists design experiments and contribute to research breakthroughs.



As we work to accelerate scientific progress safely and responsibly, we also seek to evaluate and reduce risks, particularly those related to biosecurity. These evaluations results show that models can reason in the wet lab to improve protocols, and may have implications for biosecurity as described in our Preparedness Framework. We are committed to building necessary and nuanced safeguards at a model and system level to reduce these risks, as well as develop evaluations to track current levels.



Generated by RSStT. The copyright belongs to the original author.

Source

Report Page