Accelerating life sciences research

在OpenAI，我们相信人工智能能够显著加速生命科学的创新。为了验证这一信念，我们与长寿生物技术初创公司Retro Bio的应用AI团队合作，开发并研究了GPT-4b micro——一种专门用于蛋白质工程的迷你版GPT-4o模型。

我们很高兴地分享，借助GPT-4b micro，我们成功设计出了新颖且显著增强的Yamanaka因子变体。Yamanaka因子是一组蛋白质，因其在诱导多能干细胞（iPSCs）生成和细胞复兴中的关键作用而获得诺贝尔奖。这些因子还被用于开发治疗失明、逆转糖尿病、治疗不孕症以及解决器官短缺的疗法。

在体外实验中，这些重新设计的蛋白质在干细胞重编程标志物的表达上比野生型对照高出50倍以上。同时，它们表现出增强的DNA损伤修复能力，显示出比基线更高的复兴潜力。这一发现于2025年初提出，现已通过多位供体、多种细胞类型和递送方法的重复验证，确认了衍生iPSC系的完全多能性和基因组稳定性。为了让生命科学行业能够发现并复现这些成果，我们现公开分享GPT-4b micro的研发见解。

图1展示了显微镜下的细胞形态变化：左图为初始状态的成纤维细胞，中央为使用标准Yamanaka因子（SOX2、KLF4、OCT4、MYC）重编程10天后的细胞，右图为使用我们新设计蛋白质变体重编程10天后的细胞。右图中圆形、白色细胞的增加表明重编程效率提升，且细胞群落呈现典型的iPSC样紧凑圆形形态。

我们设计并训练了定制模型GPT-4b micro，具备广泛的生物学知识和技能，特别强调可控性和灵活性，以支持蛋白质工程等高级应用。该模型基于缩小版GPT-4o初始化，随后在主要由蛋白质序列、生物学文本和三维结构数据组成的数据集上进一步训练，这些元素通常被蛋白质语言模型忽略。

大量数据被丰富了额外的上下文信息，如蛋白质的文本描述、共进化同源序列及已知相互作用的蛋白质群组。这使得GPT-4b micro能够根据特定需求生成具有目标性质的序列。由于大部分数据不包含结构信息，模型对具有内在无序区域的蛋白质处理同样出色，这对Yamanaka因子尤为重要，因为它们的活性依赖于与多种结合伙伴形成多种瞬时相互作用，而非单一稳定结构（见图2）。

在开发过程中，我们观察到类似语言模型的规模效应规律：更大模型和更多数据带来更好的性能表现。我们能够在小规模上快速迭代，最终训练出GPT-4b micro。然而，蛋白质AI模型的计算机模拟评估往往价值有限，难以判断其改进是否能转化为实际效用。为证明模型能加速治疗开发，我们与Retro科学家合作，利用模型重新设计了与细胞重编程相关的蛋白质。

Yamanaka因子（OCT4、SOX2、KLF4、MYC）是再生生物学中极为重要的蛋白质，因其能将成体细胞重编程为多能干细胞而闻名，发现者山中伸弥因此获得2012年诺贝尔生理学或医学奖。但这些因子效率低下，通常治疗中仅有不到0.1%的细胞成功转化，且过程需三周以上。老年或病态供体细胞的效率更低，因此寻找更高效的变体是当前重要研究方向。

直接优化蛋白质序列极其困难。SOX2含317个氨基酸，KLF4含513个，可能的变体数量约为10^1000，传统的“定向进化”方法只能探索极小部分设计空间。领先的学术研究测试了数千个SOX2突变体，仅发现少数三重突变体有适度提升，15年对嵌合SOX蛋白的研究也仅获得与天然SOX差异5个残基的变体。

Retro团队建立了基于人类成纤维细胞的湿实验筛选平台，初步验证了基线OSKM和Retro科学家手工设计的SOX2变体。随后，他们让GPT-4b micro提出多样化的“RetroSOX”序列。在筛选中，超过30%的模型建议在表达关键多能性标志物方面优于野生型SOX2，且平均氨基酸差异超过100个。相比之下，传统筛选的命中率通常低于10%。

接着团队对最大Yamanaka因子KLF4进行重新设计。虽然KLF4可被其他KLF家族成员替代，但未提升重编程效率。此前专家指导的单氨基酸替换尝试中，19个变体仅有1个命中。我们同样让模型生成增强的RetroKLF变体，结果14个变体优于RetroSOX筛选的最佳组合，命中率近50%（见图4）。

将顶级RetroSOX和RetroKLF变体组合使用，带来最大提升。三次独立实验中，成纤维细胞早期（SSEA-4）和晚期（TRA-1-60、NANOG）标志物显著增加，晚期标志物出现时间比野生型OSKM提前数天（见图5）。此外，10天时的碱性磷酸酶（AP）染色验证了这些细胞群落不仅表达晚期多能性标志物，还表现出强烈的AP活性，表明多能性（见图6）。

为进一步确认改进的重编程效率及临床潜力，我们采用mRNA递送方法，测试了来自三位50岁以上中年供体的间充质基质细胞（MSCs）。仅7天内，超过30%的细胞开始表达关键多能性标志物（SSEA4和TRA-1-60），12天时出现大量形态类似典型iPSC的细胞群落（见图7左、中）。超过85%的细胞激活了内源性关键干细胞标志物表达，包括OCT4、NANOG、SOX2和TRA-1-60。

我们还验证了这些RetroFactor衍生的iPSC能成功分化为三大胚层（内胚层、外胚层和中胚层），并在多次传代中扩增多个单克隆iPSC系，确认其核型健康（见图7右）和基因组稳定性，适合细胞治疗。这些结果均优于传统合同研究机构使用标准因子生成的iPSC系，进一步支持了我们设计变体的稳健性，并证明了其在不同递送方式和细胞类型中的增强iPSC生成能力。

综上所述，高命中率、深度序列编辑、标志物加速出现及AP阳性群落形成，初步证明了AI指导的蛋白质设计能显著加速干细胞重编程研究进展。

受此启发，我们进一步研究了重新设计变体的复兴潜力，特别是其恢复老化细胞年轻特性的能力。我们关注DNA损伤，这是细胞功能受损和衰老的典型标志。早期研究表明，Yamanaka因子能擦除小鼠细胞中的DNA损伤相关衰老标志，而不完全逆转细胞身份。我们测试了变体相较于基线OSKM的复兴能力。

DNA损伤实验中，使用RetroSOX/KLF组合的细胞显示出明显较低的γ-H2AX信号（DNA双链断裂标志），优于标准OSKM和荧光对照组（见图8），表明其在相同基因毒性挑战下修复能力更强。

这些结果表明，RetroSOX/KLF组合比原始Yamanaka因子更有效减少DNA损伤。通过缓解细胞衰老的核心标志之一，这些设计变体为未来细胞复兴和治疗提供了潜在路径。

展望未来，OpenAI认为这项工作展示了领域专用模型在聚焦科学问题上快速取得突破的能力。OpenAI研究合作负责人Boris Power表示：“当研究人员将深厚的领域洞察力与我们的语言模型工具结合时，曾经需要数年的问题可以在数天内解决。我们期待看到更多团队结合专业知识与我们构建的模型，带来更多进展。”

At OpenAI, we believe that AI can meaningfully accelerate life science innovation. To test this belief, we collaborated with the Applied AI team at Retro Bio⁠, a longevity biotech startup, to create and research the impact of GPT‑4b micro, a miniature version of GPT‑4o specialized for protein engineering.

We are excited to share that we’ve successfully leveraged GPT‑4b micro to design novel and significantly enhanced variants of the Yamanaka factors, a set of proteins which led to a Nobel Prize for their role in generating induced pluripotent stem cells (iPSCs) and rejuvenating cells. They have also been used to develop therapeutics to combat blindness⁠, reverse diabetes⁠, treat infertility⁠, and address organ shortages⁠.

In vitro, these redesigned proteins achieved greater than a 50-fold higher expression of stem cell reprogramming markers than wild-type controls. They also demonstrated enhanced DNA damage repair capabilities, indicating higher rejuvenation potential compared to baseline. This finding, made in early 2025, has now been validated by replication across multiple donors, cell types, and delivery methods, with confirmation of full pluripotency and genomic stability in derived iPSC lines. To ensure the findings are discoverable and replicable to benefit the life sciences industry, we are now sharing insights into the research and development of GPT‑4b micro.

Figure 1: Left is initial conditions; center is standard proteins; right is using our novel proteins. The increase in round, white cells indicate an increase in reprogramming. More specifically: phase-contrast images of human fibroblasts before induction (left) and 10 days after transduction with wild‑type Yamanaka factors (middle), and our re-engineered variants (right). The engineered cocktail promotes the emergence of colonies with the compact, round morphology typically seen during progression toward an iPSC‑like state.

An experimental GPT model for protein engineering

To test our belief that AI can be used to accelerate life sciences research, we designed and trained a custom model—GPT‑4b micro—to possess a broad base of knowledge and skills across biology, with a particular focus on steerability and flexibility to enable advanced use cases such as protein engineering. We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.

A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact. This context allows GPT‑4b micro to be prompted to generate sequences with specific desired properties and, since most of the data is structure-free, the model handles proteins with intrinsically disordered regions just as well as structured proteins. This is particularly useful for targets like the Yamanaka factors, whose activity depends on forming numerous transient interactions with a diverse array of binding partners, rather than adopting a single stable structure (Figure 2).

Figure 2: Visualization of the 3D structure of the Yamanaka factors KLF4 (left) and SOX2 (right). Notice that the majority of these proteins are unstructured, with flexible arms that attach to other proteins.
Source: AlphaFold Protein Structure Database (left⁠, right⁠)

By training on proteins enriched with additional evolutionary and functional context, we substantially increased the effective context length of our training examples beyond that of standalone sequences. Consequently, we found that we could run prompts as large as 64,000 tokens during inference and continue to observe gains in controllability and output quality. While common in text LLMs, this context size is unprecedented in protein sequence models.

During development we observed the emergence of scaling laws similar to those seen in language models—larger models trained on larger datasets yielded predictable gains in perplexity and downstream protein benchmarks, allowing us to iterate at small scale before training the final GPT‑4b micro model. However, in silico evals for protein AI models are often of limited value, as it is unclear if such improvements translate to increased utility in the real world. To demonstrate that the model is capable of accelerating therapeutic development, we worked with Retro’s scientists, who used the model to re-engineer proteins relevant to their cell-reprogramming research program.

AI-assisted reengineering of SOX2 and KLF4 to increase stem cell reprogramming efficiency

The Yamanaka factors—OCT4, SOX2, KLF4, and MYC (OSKM)—are some of the most important proteins in regenerative biology today, and are named after Shinya Yamanaka, who discovered their ability to reprogram adult cells into pluripotent stem cells, a breakthrough that earned him the Nobel Prize in Physiology or Medicine in 2012. Unfortunately, they suffer from poor efficiency: typically less than 0.1% of cells convert during treatment, and the process can take three weeks or more. Efficiency drops further in cells from aged or diseased donors⁠, so finding more efficient variants remains an active and important research focus.

Directly optimizing the protein sequences is hard. SOX2 contains 317 amino acids and KLF4 513; the number of possible variants is on the order of 10^1000, so traditional “directed-evolution” screens that mutate a handful of residues at a time are able to explore only a miniscule fraction of the design space. A leading academic effort⁠ tested a few thousand SOX2 mutants and found a handful of triple-mutants with a modest gain, while 15 years of work on chimeric SOX⁠ proteins has yielded variants that differ from natural SOX by only five residues.

The team at Retro built a wet lab screening platform using human fibroblast (skin and connective tissue) cells, initially validating it with baseline OSKM and SOX2 variants manually designed by Retro’s scientists as part of their pilot screen (Figure 3). Then, they asked GPT‑4b micro to propose a diverse set of “RetroSOX” sequences. Over 30% of the model’s suggestions in the screen outperformed wild‑type SOX2 at expressing key pluripotency markers, even though they differed by more than 100 amino acids on average. For comparison, in traditional screens⁠, hit rates below 10% are typical.

The team next tackled reengineering KLF4, the largest of the Yamanaka factors. KLF4 is known to be replaceable with other KLF-family factors⁠ but without an increase in reprogramming efficacy. Prior attempts to improve KLF4 by expert-guided single amino acid substitutions⁠ produced a single hit out of 19. As with RetroSOX, we prompted the model to generate a set of enhanced RetroKLF variants. Overall, 14 model-generated variants were superior to the best cocktails from the RetroSOX screen—a hit rate of nearly 50% (Figure 4).

Figure 3: Percentage of cells in our wet-lab experiments that express early (SSEA4) pluripotency markers (left), and fraction of those cells that express late (TRA-1-60) markers (right) when reprogrammed with either standard OSKM or a pool of engineered variants. The pilot screen (green) uses standard methods, which have poor efficacy rates of around 0.1%, explaining the 0% expression rate of TRA-1-60.

Figure 4: Screen hit rate, i.e. fraction of protein candidates that elicited reprogramming‑associated markers at higher rates than the baseline (left), percent of protein sequence that was changed relative to the wild-type variant found in humans (right). “Conventional” methods refer to few-amino-acid mutation screens.

Combining the top RetroSOX and RetroKLF variants produced the largest gains. Across three independent experiments, fibroblasts showed a dramatic rise in both early (SSEA-4) and late (TRA-1-60, NANOG) markers, with late markers appearing several days sooner than under the wild-type OSKM cocktail (Figure 5).

Figure 5: Two late-stage markers (TRA-1-60 & NANOG) appear strongly enriched at day 10 whereas at this timepoint, no expression can be detected in cells reprogrammed with wild-type OSKM. RK1 through RK4 correspond to different RetroSOX and RetroKLF variants.

In addition, the RetroSOX and RetroKLF variants were validated by alkaline phosphatase (AP) staining at day 10, confirming that the colonies not only express late-stage pluripotency markers but also exhibit robust AP activity indicative of pluripotency (Figure 6).

Figure 6: Purple colonies indicate successful stem cell reprogramming; more intense and numerous colonies indicate greater efficiency. Alkaline phosphatase (AP) staining of fibroblast-derived colonies at day 10 after transduction with RetroSOX and RetroKLF variants demonstrates robust AP activity, suggesting that the engineered variants have induced pluripotency in the colonies.

To further confirm the improved reprogramming efficiency and explore clinical potential, we tested a different delivery method (mRNA instead of viral vectors) and another cell type—mesenchymal stromal cells (MSCs)—derived from three middle-aged human donors (over 50 years old). Within just 7 days, more than 30% of the cells began expressing key pluripotency markers (SSEA4 and TRA-1-60), and by day 12, numerous colonies appeared with morphology similar to typical iPSCs (Figure 7, left and center). Over 85% of these cells activated endogenous expression of critical stem cell markers, including OCT4, NANOG, SOX2, and TRA-1-60.

We then verified that these RetroFactor-derived iPSCs could successfully differentiate into all three primary germ layers (endoderm, ectoderm, and mesoderm). Additionally, we expanded multiple monoclonal iPSC lines over several passages, confirming healthy karyotypes (Figure 7, right) and genomic stability suitable for cell therapies. These results consistently surpassed benchmarks obtained from conventional iPSC lines generated by contract research organizations using standard factors, further supporting the robustness of our engineered variants. Moreover, they provide evidence of enhanced iPSC generation across different delivery modalities and cell types.

*Figure* 7: The three panels show key markers of successful stem cell reprogramming: colonies with the characteristic rounded, tightly-packed appearance of stem cells (left), positive staining for TRA-1-60 stem cell marker shown in green (middle), and normal chromosome structure (right)—together confirming healthy, fully reprogrammed stem cells.

Taken together, the high hit rates, deep sequence edits, accelerated marker onset, and AP+ colony formation provide early evidence that AI-guided protein design can substantially accelerate progress in stem cell reprogramming research.

Reengineered Variants Enhance DNA Damage Repair

Motivated by these results, we next investigated the rejuvenation potential of our re-engineered variants, specifically examining their ability to restore youthful characteristics to aged cells. We focus on DNA damage, which causes impaired cellular function and is a canonical hallmark of aging⁠. Earlier work⁠ has demonstrated that Yamanaka factors can erase DNA damage-related aging markers in cells derived from mice without fully reverting cell identity. We sought to find out whether our variants showed enhanced rejuvenation capabilities relative to baseline OSKM.

Figure 8: DNA‑damage assay following doxorubicin stress, lower is better. Human fibroblasts were treated with doxorubicin to induce double‑strand breaks and then reprogrammed with either GFP (negative control), wild‑type OSKM, or the engineered RetroSOX + RetroKLF variants along with the remaining reprogramming factors (O, K/S, M). γ‑H2AX immunostaining was used to quantify DNA damage: lower fluorescence intensity means fewer breaks and better repair. Cells expressing the Retro variants show a marked drop in γ‑H2AX signal relative to both controls (GFP vs RS4: p=0.03; GFP vs RS5: p=0.01; OSKM vs RS4: p=0.04), demonstrating more effective repair after the same genotoxic challenge.

In our DNA‑damage assay, cells treated with the RetroSOX/KLF cocktail showed visibly less γ‑H2AX intensity—a marker of double‑strand breaks—than cells reprogrammed with standard OSKM or a fluorescent control (Figure 8).

These results suggest that the RetroSOX/KLF cocktail reduces DNA damage more effectively than the original Yamanaka factors. By ameliorating one of the core hallmarks of cellular aging, the engineered variants offer a potential path toward improved cell rejuvenation and use in future therapies.

Where we go from here

To OpenAI, this work is an illustration of how quickly a domain-specific model can deliver breakthrough results on a focused scientific problem. “When researchers bring deep domain insight to our language-model tooling, problems that once took years can shift in days,” says Boris Power, who leads research partnerships at OpenAI. “We look forward to seeing what other advances emerge as more teams pair their expertise with the models we’re building.”

Generated by RSStT. The copyright belongs to the original author.

Source

Accelerating life sciences research

An experimental GPT model for protein engineering

AI-assisted reengineering of SOX2 and KLF4 to increase stem cell reprogramming efficiency

Reengineered Variants Enhance DNA Damage Repair

Where we go from here

Report Page