The Race to Give AI Models Infinite Memory

The Race to Give AI Models Infinite Memory

Analytics India Magazine (Siddharth Jindal)

After improving LLMs’ reasoning skills, the next frontier for AI researchers is to give them infinite memory.  

Today, users around the world have gradually woven AI into their daily lives, both personal and professional. To truly make AI indispensable, a memory boost could be crucial, allowing systems to retain what matters without being prompted repeatedly.

OpenAI CEO Sam Altman, in a December podcast appearance, said that memory will have far more impact on language models than reasoning. He suggested that once AI can remember a person’s entire history and subtle preferences and not just explicit facts, it will become deeply personalised and far more powerful.

Today, Context Window acts like short-term working memory, but LLMs still lack true long-term recall, and the transformer architectures behind them struggle to reliably handle very long input sequences.

In an exclusive conversation with AIM, IISc professor Jayant Haritsa said that LLMs with a large context window cannot replace database systems. “You cannot substitute database guarantees with LLMs,” he said. However, Haritsa added that LLMs can play a supporting role by helping optimise database systems.

Building something close to infinite memory will require fresh innovation from database companies. Haritsa noted that modern databases are increasingly being built to handle much larger amounts of memory.

“We already have almost infinite memory because main memory is so cheap now,” Haritsa said. That shift, he explained, has quietly driven one of the biggest changes in modern database architecture.

He shared that traditional databases were built as row stores, where one row of data was written after another, and each row contained all the fields of a record. This design worked well when storage access was slow and memory was limited.

Haritsa said modern systems, however, increasingly use columnar storage. Instead of storing full rows together, each column is stored separately, often in its own file.

He explained that the content is currently stored in a columnar format, meaning a separate file exists for each column in a database. Although the actual content remains identical, the architecture and resulting performance differ significantly.

When people say LLMs have infinite memory, they are not referring to the model itself remembering everything. The memory actually lives outside the model, in fast main memory that stores conversation history, documents, and vector embeddings. 

Because modern systems can keep enormous amounts of data in RAM and retrieve it in milliseconds, relevant context can be pulled in and fed back to the model on demand. “With large memories, you can now build large columnar databases,” Haritsa said. “You can build many more B-tree indexes on essentially the entire database. So search, and everything becomes much faster.”

In older systems, indexing every field was impractical because indexes consumed precious space and slowed down writes. Today, memory abundance flips that trade-off. The result is systems optimised for fast reads, complex filters, and real-time analytics at scale.

Haritsa added that this same shift has coincided with the rise of vector databases, which store high-dimensional embeddings used by modern AI systems.

As embeddings and unstructured data become central to how AI models retrieve and reason over information, database architecture itself is emerging as a key differentiator.

That change is now shaping how database vendors position themselves for the AI era. Benjamin Cefalo, senior vice president and head of core products and Atlas Foundational Services at MongoDB, told AIM that the company’s edge lies in its design choice. 

He said MongoDB was built around JSON documents rather than rows and columns, a distinction that matters because modern AI systems, particularly retrieval-augmented generation (RAG) and agentic applications, primarily work with semi-structured and unstructured data such as text, logs, documents, user activity, and embeddings.

Instead of forcing this data into relational tables, MongoDB stores it natively as documents. According to Cefalo, this makes the database a more natural foundation for AI workloads, rather than a system retrofitted to accommodate them. 

Research Towards Infinite Memory 

Significant research is underway to improve LLMs’ memory, spanning major tech companies, research labs, and independent researchers. Google Research recently introduced two advances, Titans and the MIRAS framework, to address the bottleneck in long-term memory. 

Traditional Transformer models struggle with very long inputs because their attention systems become expensive as text grows longer. In contrast, recurrent models try to squeeze information into a fixed-size memory, often losing important details in the process.

To address this, Google’s approach combines the efficiency of recurrent models with the precision of Transformers. Titans is a new neural architecture that adds a dedicated long-term memory module capable of actively learning and updating information as data streams in, rather than relying solely on short-term attention or static memory vectors. 

Under the hood, Titans is built on the MIRAS framework, which treats sequence modelling as a process of continuously updating and using memory, rather than processing information in one fixed pass. Instead of compressing history into a static state, MIRAS treats memory, attention bias, retention, and update mechanisms as unified components that can adapt as the model processes input. 

Besides this, Google also introduced a new framework called Nested Learning (NL), which reframes how neural networks store information, learn from data and adapt over time. They claim this approach may explain why current AI systems hit limits and how future models could move beyond them.

Together, these advances go beyond simple context windows, allowing AI systems to learn and retain information as they are used, without needing separate retraining later.

Other Notable Research 

20-year-old Dhravya Shah’s startup, Supermemory, is also working on providing more context toAI tools. 

Supermemory is designed as a universal memory layer for AI applications. The product takes unstructured data such as files, chats, emails, project updates, and PDFs, and converts it into a personalised knowledge graph for users across AI tools such as ChatGPT, Gemini, and Claude. These apps can then recall this memory to connect relevant information across time and platforms.

Meanwhile, researchers from China and Hong Kong have introduced General Agentic Memory (GAM). This dual-agent memory system keeps a complete record of history and recalls only the parts needed, helping models retain long conversations without over-compressing context. 

The Problem That Memory Cannot Solve

Despite these advances, Haritsa was careful to draw a rigid boundary around what abundant memory can and cannot fix.

“The fundamental problems of giving guarantees will not change,” he said. Enterprise and mission-critical systems still demand strict correctness, durability, and consistency. While large memory may make some problems easier, it does not remove the need to prove correctness.

According to Haritsa, the hardest part is still proving that these systems can be trusted in mission-critical and enterprise environments. That problem, he said, needs to be addressed from first principles.

In other words, infinite memory does not mean infinite trust. Where things get interesting, Haritsa argued, is at the boundary between classical databases and AI-driven systems.

Because of machine learning and LLMs, a new category of applications is emerging that does not require exact answers. “In many applications, it may not be necessary to give precise answers. An approximate answer may be okay,” he said.

This tolerance opens the door to probabilistic techniques that trade precision for speed, something traditional databases were never designed to do.

Where Precision Is Non-Negotiable

However, approximation has strict limits. “There are many applications in database systems like income tax filing, where it cannot be approximate,” Haritsa said. “They expect it to be correct.”

In these domains, traditional database techniques are still indispensable, as strong guarantees, precise answers, and deterministic behaviour are required by law and financial regulation.

“So when absolute precision is required,” Haritsa said, “all the old database technology is still required.”

In his view, databases and AI systems will coexist rather than replace each other. Databases will prioritise accuracy and trust, while AI systems will be more comfortable dealing with uncertainty.

Infinite memory will change how AI systems are built. But it does not remove the need for reliability. As Haritsa points out, databases and AI are evolving in parallel rather than replacing each other. Memory may expand what AI can do, but guarantees will continue to define where it can be trusted.

The post The Race to Give AI Models Infinite Memory appeared first on Analytics India Magazine.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page