Netomi’s lessons for scaling agentic systems into the enterp…

企业希望智能代理能在混乱的工作流中可靠执行、默认遵守策略、承受高并发负载并记录其推理过程。

Netomi 构建了能满足这些高标准的系统，为包括 United Airlines 和 DraftKings 在内的 Fortune 500 客户提供服务。其平台将 GPT‑4.1 用于低时延、可靠的工具调用，与用于更深度、多步骤规划的 GPT‑5.2 配对运行，二者都在一个受治理的执行层内执行，旨在让模型驱动的操作在真实生产环境中可预测。

在这一规模上运行具代理能力的系统，让 Netomi 总结出了一套企业级部署的可行路线图。

经验一：面向现实复杂性而非理想化流程

单一企业请求很少能直接对应到单一的 API 。真实的工作流横跨预订引擎、积分/忠诚度数据库、 CRM 、政策逻辑、支付和知识库，数据常常不完整、互相冲突或具时效性。依赖脆弱流程的系统在这种变动下会崩溃。

Netomi 在其 Agentic OS 中以 OpenAI 模型为核心，构建了一条能应对高度模糊性的受治理编排管道。平台在实时工作流中使用 GPT‑4.1 提供快速、可预测的推理与工具调用；在需要多步骤规划或更深推理时则启用 GPT‑5.2 。

“我们的目标是以机器速度编排人工客服通常要同时处理的诸多系统，并且做到安全可控。”—— Puneet Mehta ， Netomi 的 CEO

为保证在冗长复杂任务中代理行为的一致性， Netomi 遵循了 OpenAI 推荐的 agentic prompting 模式，包括：

通过持久性提醒，帮助 GPT‑5.2 在长链多步骤工作流中持续传递推理状态；
明确工具使用预期，通过引导 GPT‑4.1 在事务性操作中调用权威工具来抑制幻觉式回答；
结构化规划，借助 GPT‑5.2 的深度推理来列出并执行多步骤任务；
由代理驱动的富媒体决策，依赖 GPT‑5.2 检测并指示何时需要返回图片、视频、表单或其他多模态元素。

这些模式共同促使模型能将非结构化请求可靠地映射为多步骤工作流，并在不连续的交互中维持状态。

航空业对多步骤推理的需求尤为明显：一次交互常涉及多套系统和政策层级，一条询问可能需要查票价规则、重算积分权益、触发改签并与航班运营协调。

“在航空场景中，情境每分钟都在变化。AI 必须理解客户所处的情境——而不仅仅是执行孤立任务，” Mehta 说。“这就是为什么情境感知比单纯的工作流重要得多，也为什么以情境为导向的组合式架构不可或缺。”

借助 GPT‑4.1 与 GPT‑5.2 ， Netomi 能把这些模式持续扩展到更丰富的多步骤自动化中——不仅用模型来回答问题，还用它们来规划任务、安排动作并协调大型航空公司所依赖的后端系统。

经验二：一切并行以满足企业级延迟预期

在高压时刻——暴风雨中的改签、账单纠纷的解决或突发流量激增——用户会放弃任何迟疑的系统。延迟就是信任。

大多数 AI 系统失败的原因是串行执行：先分类→检索→校验→调用工具→生成输出。 Netomi 的设计则以并发为前提，利用 GPT‑4.1 的低时延流式响应和稳定的工具调用能力。

在企业客服的工作流中，客户关于改签被取消航班的查询会通过社交、聊天、短信、邮件、搜索或语音等多渠道进入。系统识别为改签场景后，先应用安全保护措施，编排工具调用以检索备选方案并套用票价与忠诚度规则，在预订与 CRM 系统间执行操作，最后组装并校验响应，向客户交付个性化的改签选项和积分赔偿。

GPT‑4.1 提供了快速的首字节响应和可预测的工具调用行为，使得此类架构在大规模下可行；而 GPT‑5.2 则在需要时提供更深层的多步骤推理。 Netomi 的并发框架确保的是整个系统——而非单一模型——保持在关键的延迟阈值之下。

这种并发要求并非航空业独有。任何面对突发极端流量的系统都需要同样的架构纪律。以 DraftKings 为例，在重大体育赛事期间流量常常飙升至每秒超过 40,000 个并发客户请求。

在此类高峰期间， Netomi 曾在触及账户、支付、知识检索与合规检查的工作流下维持低于三秒的响应时间，并实现 98% 的意图分类准确率。

“AI 是我们在关键时刻支持客户的核心与关键，” Paul Liberman ， DraftKings 的联合创始人兼运营总裁说。“ Netomi 的平台帮助我们以敏捷与精确应对大规模流量峰值。”

在大规模场景下， Netomi 的并发模型依赖于 GPT‑4.1 的快速且可预测的工具调用，这使得多步骤工作流在极端负载下仍保持响应性。

经验三：将治理内嵌于运行时

企业级 AI 必须以可被信任为设计目标，把治理直接编织进运行时，而不是作为外加层。

当意图置信度低于阈值，或请求无法高置信度分类时， Netomi 的治理机制会介入决定如何处理该请求，确保系统从自由生成回退到受控的执行路径。

在技术层面，治理层负责：

模式校验：在执行前按预期参数和 OpenAPI 合约验证每一次工具调用；
策略执行：在推理与工具使用过程中内联应用主题过滤、品牌限制与合规检查；
个人敏感信息保护：在预处理和响应处理中检测并掩码敏感数据（ PI I ）；
确定性回退：当意图、数据或工具调用存在歧义时，回退到已知安全行为；
运行时可观测性：暴露 token 路径、推理步骤与工具链日志，便于实时检查与调试。

在像牙科保险这样高度监管的领域，这类治理是不可妥协的。 Netomi 的一位保险业客户每年在所有 50 个州处理近两百万次服务提供方请求，包含资格核查、福利查询与理赔状态查询，其中一次错误回答就可能产生下游的监管或服务风险。

在开放注册期，当审查与请求量都达到高峰时，该公司需要将政策执行嵌入运行时的 AI。 Netomi 的架构满足了这一复杂要求。

“我们把系统设计成：一旦代理进入不确定状态，它就知道如何安全地回退，” Mehta 说。“治理不是外加的配件——它就是运行时的一部分。”

为企业打造可用的具代理能力系统的蓝图

Netomi 的实践展示了赢得企业信任所需的要点：面向复杂性构建、并行化以满足延迟要求，并将治理内嵌于每一道工作流。 OpenAI 模型构成了推理骨干，而 Netomi 的系统工程确保这些智能在运营上安全、可审计，并适用于 Fortune 500 的环境。

这些原则帮助 Netomi 在全球最苛刻的行业中实现规模化，也为任何希望将具代理能力的 AI 做成生产级基础设施的初创公司提供了可行蓝图。

Enterprises expect AI agents to handle messy workflows reliably, honor policies by default, operate under heavy load, and show their work.

Netomi⁠ builds systems that meet that high bar, serving Fortune 500 customers like United Airlines and DraftKings. Their platform pairs GPT‑4.1 for low-latency, reliable tool use with GPT‑5.2 for deeper, multi-step planning, running both inside a governed execution layer designed to keep model-driven actions predictable under real production conditions.

Running agentic systems at this scale has given Netomi a blueprint for what makes these deployments work inside the enterprise.

Lesson 1: Build for real-world complexity, not idealized flows

A single enterprise request rarely maps to a single API. Real workflows span booking engines, loyalty databases, CRM systems, policy logic, payments, and knowledge sources. The data is often incomplete, conflicting, or time-sensitive. Systems that depend on brittle flows collapse under this variability.

Netomi designed its Agentic OS so OpenAI models sit at the center of a governed orchestration pipeline built for this level of ambiguity. The platform uses GPT‑4.1 for fast, reliable reasoning and tool-calling—critical for real-time workflows—and GPT‑5.2 when multi-step planning or deeper reasoning is required.

“Our goal was to orchestrate the many systems a human agent would normally juggle and do it safely at machine speed.”

—Puneet Mehta, CEO, Netomi

To ensure consistent agent behavior across long, complex tasks, Netomi follows the agentic prompting patterns recommended by OpenAI:

Persistence reminders to help GPT‑5.2 carry reasoning across long, multi-step workflows
Explicit tool-use expectations, suppressing hallucinated answers by steering GPT‑4.1 to call tools for authoritative information during transactional operations
Structured planning, which leverages GPT‑5.2’s deeper reasoning to outline and execute multi-step tasks
Agent-driven rich media decisions, relying on GPT‑5.2 to detect and signal when a tool call should return images, videos, forms, or other rich, multimodal elements

Together, these patterns help the model reliably map unstructured requests to multi-step workflows and maintain state across discontinuous interactions.

Few industries expose the need for multi-step reasoning as clearly as airlines, where one interaction routinely spans multiple systems and policy layers. A single question may require checking fare rules, recalculating loyalty benefits, initiating ticket changes, and coordinating with flight operations.

“In airlines, context changes by the minute. AI has to reason about the scene the customer is in—not just execute a siloed task,” said Mehta. “That’s why situational awareness matters way more than just workflows, and why a context-led ensemble architecture is essential.”

With GPT‑4.1 and GPT‑5.2, Netomi can keep extending these patterns into richer multi-step automations—using the models not just to answer questions, but to plan tasks, sequence actions, and coordinate the backend systems a major airline depends on.

Lesson 2: Parallelize everything to meet enterprise latency expectations

In high-pressure moments—rebooking during a storm, resolving a billing issue, or handling sudden spikes in demand—users will abandon any system that hesitates. Latency defines trust.

Most AI systems fail because they execute tasks sequentially: classify → retrieve → validate → call tools → generate output. Netomi instead designed for concurrency, taking advantage of low-latency streaming and tool-calling stability of GPT‑4.1.

GPT‑4.1 provides fast time-to-first-token and predictable tool-calling behavior, which make this architecture viable at scale; while GPT‑5.2 provides deeper multi-step reasoning paths when needed. Netomi’s concurrency framework ensures the total system, not just the model, stays under critical latency thresholds.

These concurrency demands aren’t unique to airlines. Any system exposed to sudden, extreme traffic surges needs the same architectural discipline. DraftKings, for instance, regularly stress-tests this model, with traffic during major sporting events spiking above 40,000 concurrent customer requests per second.

During such events, Netomi has sustained sub-three-second responses with 98% intent classification accuracy, even as workflows touch accounts, payments, knowledge lookups, and regulatory checks.

“AI is central and critical to how we support customers in the moments that matter most,” said Paul Liberman, Co-Founder and President of Operations at DraftKings. “Netomi’s platform helps us handle massive spikes in activity with agility and precision.”

At scale, Netomi’s concurrency model depends on the fast, predictable tool-calling of GPT‑4.1, which keeps multi-step workflows responsive under extreme load.

Lesson 3: Make governance an intrinsic part of the runtime

Enterprise AI must be trustworthy by design, with governance woven directly into the runtime—not added as an external layer.

When intent confidence drops below threshold, or when a request cannot be classified with high certainty, Netomi’s governance mechanisms kick in to determine how the request is handled, ensuring the system backs off from free-form generation in favor of controlled execution paths.

At a technical level, the governance layer handles:

Schema validation, which validates every tool call against expected arguments and OpenAPI contracts before execution
Policy enforcement that applies topic filters, brand restrictions, and compliance checks inline during reasoning and tool use
PII protection to detect and mask sensitive data as part of pre-processing and response handling
Deterministic fallback, routing back to known-safe behaviors when intent, data, or tool calls are ambiguous
Runtime observability, exposing token traces, reasoning steps, and tool-chain logs for real-time inspection and debugging

In highly regulated domains like dental insurance, this kind of governance is non-negotiable. A Netomi customer in the insurance industry processes close to two million provider requests each year across all 50 states, including eligibility checks, benefits lookups, and claim status inquiries where a single incorrect response can create downstream regulatory or service risk.

During open enrollment, when scrutiny and volume peaked, the company needed AI that enforced policy as part of the runtime itself. Netomi’s architecture was up to that complex requirement.

“We built the system so that if the agent ever reaches uncertainty, it knows exactly how to back off safely,” said Mehta. “The governance is not bolted on—it’s part of the runtime.”

A blueprint for building agentic systems that work for the enterprise

Netomi’s path shows what it takes to earn enterprise trust: build for complexity, parallelize to meet latency demands, and bake governance into every workflow. OpenAI models form the reasoning backbone, while Netomi’s systems engineering ensures that intelligence is operationally safe, auditable, and ready for Fortune 500 environments.

These principles helped Netomi scale across some of the world’s most demanding industries—and offer a blueprint for any startup looking to turn agentic AI into production-grade infrastructure.

Generated by RSStT. The copyright belongs to the original author.

Source

Netomi’s lessons for scaling agentic systems into the enterp…

Lesson 1: Build for real-world complexity, not idealized flows

Lesson 2: Parallelize everything to meet enterprise latency expectations

Lesson 3: Make governance an intrinsic part of the runtime

A blueprint for building agentic systems that work for the enterprise

Report Page