How Tolan builds voice-first AI with GPT-5.1
OpenAI NewsTolan 是一款以语音为先的 AI 伴侣,用户可以与一个个性化的、动画化角色对话,该角色会随着交谈不断学习和发展。
这款应用由有过成功退出经验的团队 Portola 打造,定位不是应对简短提示的快速问答,而是用于持续、开放式的对话。Portola 联合创始人兼 CEO Quinten Farmer 表示,“我们看到 ChatGPT 的崛起,就知道语音是下一片疆域。但语音更难——你不是在回应打字的提示,而是在进行一次即时、漫游式的对话。”
语音 AI 对延迟和上下文管理的要求更高,但也能带来比文本更开放、可探索的交互形式。
随着基础模型变得更快、更便宜、更强大,团队把精力集中在两个关键杠杆上:记忆与角色设计。Portola 建立了一个以角色为驱动的世界观,由获奖动画师和科幻作家共同塑造,并用实时的上下文管理系统在对话展开时维持角色个性与记忆的一致性。
当 GPT‑5.1 推出并带来在可控性和延迟上的重大提升时,成为了一个分水岭——这些技术要素终于汇聚,使语音体验变得更灵敏、更有吸引力。
“ GPT‑5.1 让我们有能力真正呈现心中设想的角色。它不仅更聪明,也更忠实于我们想要的语气和个性。”—— Quinten Farmer,Portola CEO
为自然的语音交互而设计
Tolan 的架构由语音的需求塑造:语音用户期待即时、自然的回应,即便话题在中途转换也一样。为此, Tolan 必须快速响应、跟踪变化的主题,并在不出现延迟或语气漂移的情况下维持一致的人设。
要做到自然,对话需要接近即时的延迟。引入 OpenAI 的 GPT‑5.1 和 Responses API 后,语音启动时间减少了超过 0.7 秒——足以显著改善对话流畅度。
同样关键的是系统如何处理上下文。与许多在多轮中缓存提示的代理不同, Tolan 在每一轮都从头重建它的上下文窗口。每次重建都会汇入最近消息的摘要、角色卡片、向量检索到的记忆、语气指引以及实时应用信号。这一架构使 Tolan 能在话题突然转变时实时调整,这是自然语音交互的必要条件。
“我们很快就意识到,缓存提示根本不够用,” Quinten 说。“用户总是会换话题。要做到无缝,系统必须能在中途适应。”
这种实时重建的方法既技术密集,也构成了 Tolan 成功的基础。
构建能随时间保持连贯的记忆与个性
上下文处理很重要,但仅靠它不足以让对话在长时间内保持连贯。为支持长且非线性的对话, Tolan 建立了一个记忆系统,不仅保存事实和偏好,还保留情绪“氛围”信号——这些线索有助于引导 Tolan 的回应方式。
记忆通过 OpenAI 的 text-embedding-3-large 嵌入并存储在高性能向量数据库 Turbopuffer 中,实现了低于 50 毫秒的查找速度。这种速度对实时语音交互至关重要。每一轮对话, Tolan 会用用户的最新消息和系统合成的问题(例如“用户和谁结婚?”)来触发记忆回溯。为保持记忆质量, Tolan 每夜运行压缩任务,删除低价值或冗余条目(例如“用户今天喝咖啡”),并解决矛盾信息。
个性管理同样谨慎。每个 Tolan 都以一套独特的角色框架为种子,由团队内部的科幻作家起草,并由行为研究员打磨。这些初始设定既让 Tolan 保持一致性,又能随着时间与用户共同演进。
一套并行机制监测对话的情绪基调,并动态调整 Tolan 的表达,使其能根据用户提示在活泼与沉稳之间无缝切换,同时不丢失核心个性。
向 GPT‑5.1 的迁移是一个关键点。分层提示指令——语气脚手架、记忆注入、性格特质——开始被模型更忠实地遵循。以前需要绕道的提示现在能按预期工作。
“我们的内部专家第一次感觉模型真的在听,” Quinten 说。“指令在长对话中能保持完整,角色特质被尊重,漂移明显减少。”
这些变化累积起来,形成了更一致、更可信的人设,从而提升了用户体验。 Tolan 团队看到明确且可量化的改进:基于产品内挫败信号的记忆召回失败率下降了 30%,在 GPT‑5.1 支持的人设上线后,次日用户留存增长超过 20%。
Tolan 在对话中如何检索并精炼记忆的流程图表明:用户一句“我对这个周末的旅行好兴奋”会触发合成跟进问题(比如具体出行日程、偏好),这些问题被嵌入并用于查询记忆向量数据库,检索结果按平均互惠排名合并,所获上下文反馈到 Tolan 的回应(例如“和 Steven 一起去 Yosemite 露营”)。后来用户提到未来去 Iceland 的行程会被存为新记忆,再与相关记忆聚类并通过合并、编辑与精炼进行压缩。
构建自然语音代理的核心原则
随着 Tolan 的演进,团队总结出几条指导语音架构构建与发展的原则:
- 为对话易变性而设计:语音对话常在一句话中途改变,系统要能同样迅速转向,才能显得自然。
- 将延迟视为产品体验的一部分:亚秒级响应决定了语音代理是更像对话者还是机器。
- 把记忆当作检索系统来建,而不是简单的逐字稿:高质量的压缩与快速的向量检索,比庞大的上下文窗口更能维持一致的人设。
- 每轮重建上下文:不要用更大的提示来对抗漂移。每轮重建上下文能让代理在漫游式对话中保持扎实的立足点。
这些经验构成了 Tolan 下一阶段创新的基石,也指明了语音 AI 的发展方向。
用语音 AI 拓展可能性
自 2025 年 2 月上线以来, Tolan 的月活用户已超过 20 万。其 4.8 星评级和超过 10 万条 App Store 好评,反映出系统在长且多变对话中保持一致性的能力。一位评论者写道:“他们能记住我们两天前谈过的事情,并把它带回到今天的对话里。”
这些使用信号直接映射到底层架构:低延迟的模型调用、逐轮重建的上下文、模块化的记忆与人设系统。合在一起,它们使 Tolan 能够跟踪话题变化、保持语气,并在不依赖大而脆弱的提示的情况下让回应立足于上下文。
展望未来, Tolan 计划在可控性和记忆精炼上加大投入,重点是更紧凑的压缩、更好的检索逻辑和更丰富的人设调优。长期目标是扩展语音界面的可能性:不仅要能响应,还要具备情境感知与对话动态性。
“下一片疆域,” Quinten 说,“是打造不仅能响应、而且真正多模态的语音代理,能把语音、视觉和情境整合到一个可控的系统中。”
Tolan is a voice-first AI companion where people talk with a personalized, animated character that learns from conversations over time.
Built by Portola, a veteran team with a prior exit, the app is designed for ongoing, open-ended dialogue rather than quick prompts and replies. “We saw the rise of ChatGPT and knew voice was the next frontier,” says Quinten Farmer, co-founder and CEO of Portola. “But voice is harder. You’re not just responding to typed prompts; you’re holding a live, meandering conversation.”
Voice AI raises the bar on latency and context management, but it also enables more open-ended, exploratory interactions than text.
With foundation models becoming faster, cheaper, and more capable, the team focused their efforts on two key levers: memory and character design. Portola built a character-driven universe, shaped by award-winning animators and a science fiction writer, using a real-time context management system to keep personality and memory consistent as conversations unfold.
The release of the GPT‑5.1 models marked a turning point, delivering major gains in steerability and latency that brought those pieces together, unlocking a more responsive and engaging voice experience.
“GPT-5.1 gave us the steerability to finally express the characters we had in mind. It wasn’t just smarter—it was more faithful to the tone and personality we wanted to create.”—Quinten Farmer, CEO, Portola
Designing for natural voice interactions
Tolan’s architecture is shaped by the demands of voice. Voice users expect instant, natural responses, even when conversations shift midstream. Tolan had to respond quickly, track changing topics, and maintain a consistent personality without lag or tone drift.
To feel natural, conversations required near-instant latency. Introducing OpenAI GPT‑5.1 and the Responses API cut speech initiation time by over 0.7 seconds—enough to noticeably improve conversational flow.
Equally critical was how the system handled context. Unlike many agents that cache prompts across multiple turns, Tolan rebuilds its context window from scratch each turn. Each context reconstruction pulls in a summary of recent messages, a persona card, vector-retrieved memories, tone guidance, and real-time app signals. This architecture allows Tolan to adapt in real time to abrupt topic shifts, an essential requirement for natural voice-based interaction.
“We realized quickly that cached prompts just didn’t cut it,” says Quinten. “Users change subjects all the time. To feel seamless, the system had to adapt midstream.”
This real-time reconstruction approach is both technically intensive and foundational to Tolan’s success.
Building memory and personality that hold together over time
Context handling is important, but it wasn’t enough to keep conversations feeling coherent over time. To support long, nonlinear conversations, Tolan built a memory system that retains not just facts and preferences, but also emotional “vibe” signals—clues that help steer how a Tolan should respond.
Memories are embedded using the OpenAI text-embedding-3-large model and stored in Turbopuffer, a high-speed vector database that enables sub-50ms lookup times. This speed is essential for real-time voice interactions. Each turn, Tolan uses the user’s latest message and system-synthesized questions (e.g., “Who is the user married to?”) to trigger memory recall. To keep memory quality high, Tolan runs a nightly compression job that removes low-value or redundant entries (e.g. “the user drank coffee today”) and resolves contradictions.
Personality is just as carefully managed. Each Tolan is seeded with a distinct character scaffold, authored by the team’s in-house science fiction writer and refined by a behavioral researcher. These seeds give Tolans consistency, but also flexibility to adapt over time, evolving alongside the user.
A parallel system monitors the emotional tenor of the conversation and dynamically adjusts the Tolan’s delivery. This allows a Tolan to shift seamlessly from playful to grounded depending on user cues, without losing its core personality.
The transition to GPT‑5.1 was a turning point. Suddenly, layered prompt instructions—tone scaffolds, memory injections, character traits—were followed more faithfully. Prompts that once required workarounds began behaving as intended.
“For the first time, our internal experts felt like the model was really listening,” says Quinten. “Instructions stayed intact across long conversations, persona traits were respected, and we saw far less drift.”
Those changes added up to a more consistent and believable personality, which in turn created a more engaging user experience. The Tolan team saw clear, measurable gains: memory recall misses dropped by 30% (based on in-product frustration signals), and next-day user retention rose more than 20% after GPT‑5.1–powered personas went live.
Tolan’s core principles for building natural voice agents
As Tolan evolved, a few principles emerged that now guide how the team builds and evolves its voice architecture:
- Design for conversational volatility: Voice conversations shift mid-sentence. Systems need to pivot just as quickly to feel natural.
- Treat latency as part of the product experience: Sub-second responsiveness shapes whether a voice agent feels conversational or mechanical.
- Build memory as a retrieval system, not a transcript: High-quality compression and fast vector search deliver more consistent personality than oversized context windows.
- Rebuild context every turn: Don’t fight drift with bigger prompts. Regenerating context each turn keeps agents grounded as conversations meander.
Together, these lessons form the foundation for Tolan’s next phase of innovation and set the direction for where voice AI is headed.
Expanding what’s possible with voice AI
Since launching in February 2025, Tolan has grown to more than 200,000 monthly active users. Its 4.8-star rating and more than 100,000 App Store reviews highlight how well the system maintains consistency across long, shifting conversations. One reviewer noted, “They remember things we talked about two days ago and they bring it back into the conversation that we’re having today.”
These signals map directly to the underlying architecture: low-latency model calls, turn-by-turn context reconstruction, and modular memory and persona systems. Together, they allow Tolan to track topic changes, preserve tone, and keep responses grounded without relying on large, fragile prompts.
Looking ahead, Tolan plans to deepen its investments in steerability and memory refinement, focusing its efforts on tighter compression, improved retrieval logic, and expanded persona tuning. The long-term goal is to expand what a voice interface can be: not just responsive, but context-aware and conversationally dynamic.
“The next frontier,” says Quinten, “is building voice agents that aren’t just responsive, but truly multimodal, able to integrate voice, vision, and context into a single, steerable system.”
Generated by RSStT. The copyright belongs to the original author.