Splunk Wants to Outdo LLMs to Teach AI How to Read Machine D…

Splunk Wants to Outdo LLMs to Teach AI How to Read Machine D…

Analytics India Magazine (Supreeth Koundinya)

Most enterprises hold massive volumes of logs, metrics, and sensor data, but lack the necessary infrastructure to extract value from them. 

While LLMs process discrete tokens in sequential context windows, machine data operates as a continuous time series. It has temporal dependencies that extend beyond simple sequence prediction.

Several organisations, including Google, Amazon, Salesforce, IBM, and Splunk, have released various foundational models for machine data in the time series format. 

Just as large language models understand how to learn from natural language and provide relevant outputs, these models can interpret and understand time series data in a similar manner. 

Among these efforts, Splunk’s approach offers a representative case study for how enterprises might operationalise temporal AI at scale. 

The Untapped Goldmine

At the Splunk .conf 2025 event held last month, the company outlined its plans to enable enterprises to leverage temporal AI without rebuilding machine learning infrastructure for each use case.

“Your data is your moat,” said Jeetu Patel, the president of Cisco, the parent company of Splunk, in a keynote. 

He indicated that data specific to each company is something that they own, and competitors can’t replicate it. 

“The problem that we have in this industry right now is that while your data is your moat, most people’s impediment is, ‘Can I actually use my data effectively? Can I make sure that I can create a level of synergy with the data? Can I prepare that data so that I can actually get the most value from that data?” added Patel. 

“That’s where most organisations struggle.”

Splunk is releasing a new open-source time series foundational model on HuggingFace next month. 

From Logs to Actionable Insights

Patrick Lin, Splunk’s senior vice president of observability, frames it in operational terms. 

“The promise of it is that you can have a more predictive understanding of how the signals are trending, which then means you have more time to avoid either the issue altogether, or get started before it spirals out of control,” said Lin in an interaction with AIM

He explained that the model recognises emerging patterns in system signals and predicts whether they will breach critical thresholds, giving operators lead time to intervene before cascading failures occur.

The company mentioned that this time series model can be customised to specific domains and fine-tuned with enterprise data. They also introduced a Machine Data Lake, a ready-to-use solution optimised for training these models and serving as a foundation for such processes. 

Moreover, what makes this approach powerful is the ability to cross-domain correlation. Traditional observability tools treat metrics, logs, and traces as separate data silos. Time series foundation models can analyse all three simultaneously, finding patterns that span multiple data types.

Patel explained that AI agents can perform time series analysis using the model and correlate that data with both structured and unstructured types of human-generated data.

“They’ll be able to speak the human language, which is natural language, and then they’ll be able to also speak machine language,” said Patel, adding that this will enable enterprises to predict things that would have never been possible before. 

Inside Splunk’s Training Pipeline

And building such a model required diverse data sources. Kamal Hathi, Splunk’s CTO, outlined the company’s approach in a conversation with AIM

“We started with an open-source model that was more tuned towards sensors, truly machine data, not just application network kind of data. That was the basis, and we internally run massive services. We have data that we have at scale that’s required for this kind of training,” said Hathi. 

The training pipeline utilised four data sources: refined sensor and physical system data, application and network logs from Cisco’s production infrastructure, industry-specific datasets with consent, and publicly available time series data. 

This composition aims to generalise across domains while remaining relevant to enterprise operational data. Hathi describes the model as infrastructure rather than an endpoint: “Its goal really more than anything else is to become the springboard for proprietary data,” he added. 

Splunk’s platform also includes what Hathi calls an “AI toolkit”, the same internal tooling Cisco used for model development, enabling customers to fine-tune domain-specific variants without rebuilding training infrastructure.

Aside from Splunk, numerous research efforts have been made across time series foundation models, effectively validating the technology as a production necessity. 

Google’s Promise with TimesFM-ICF

For instance, Google’s TimesFM presents a particularly instructive case. Built on a decoder-only transformer trained on hundreds of billions of timepoints, the time series foundational model initially enabled zero-shot forecasting.

At ICML 2025, Google introduced TimesFM-ICF, an extension that supports in-context fine-tuning. 

Rather than relying on supervised fine-tuning, TimesFM-ICF undergoes continuous pre-training with specially introduced separator tokens, which teach the model how to distinguish between forecasting history and auxiliary in-context examples. 

This enables the model to draw on relevant examples at inference time without conflating them with current data.

Empirical results show the payoff. Across 23 unseen datasets, TimesFM-ICF achieved 6.8% higher accuracy than the base TimesFM and matched the performance of supervised fine-tuned models, without requiring users to run task-specific training.

As Google researchers note, this “dramatically cuts costs, accelerates decision-making and innovation, and democratises access to high-end forecasting.”

This aligns with Cisco’s architectural aim, which is to deliver foundation model capabilities without requiring enterprises to rebuild ML infrastructure for each use case. 

Currently, the expected course of AI is to expand across multiple industries beyond just chatbots, coding, or the software sector. Time series models are crucial in achieving this future. 

By analysing continuous machine data, they facilitate predictive maintenance in various domains such as manufacturing, optimise energy and cooling in data centres, support early health risk detection in clinical environments, and more. 

As Patel frames it: “What does it take to enable your data to harness the power of AI fully?” The answer lies in teaching AI to speak machine language natively.

The post Splunk Wants to Outdo LLMs to Teach AI How to Read Machine Data appeared first on Analytics India Magazine.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page