Why TiDB Thinks S3 Will Power the AI-First Database Era

Why TiDB Thinks S3 Will Power the AI-First Database Era

Analytics India Magazine (Ankush Das)

With AI integrated into every technology, things are constantly evolving to accommodate the changing nature of AI. The same goes for database technologies. With the increasing volume of data and the need to manage it, companies like TiDB are taking various approaches to provide a solution.

Ed Huang, CTO of PingCAP (parent company of TiDB), believes that Amazon S3 has quietly become the backbone of scalable databases. In a recent conversation with AIM, Huang explained that TiDB uses S3 for storage to improve elasticity, resilience, and cost efficiency, emphasising the importance of developer experience.

For Huang, the journey towards S3-backed architectures began long before cloud computing became the default. 

He explained that a key challenge in building a distributed storage system is managing how data is distributed across numerous nodes. Haung noted that S3 addresses these complexities by simplifying and hiding the details of scaling and data distribution within an easy-to-use and transparent interface.

S3 as the Core of TiDB’s Architecture

TiDB has fully embraced S3 as its shared storage layer, enabling a clean separation of compute and storage. Huang said this is essential for multi-tenancy at scale, a model embraced by database-as-a-service providers like Neon and Supabase.

 “If you’re building the system based on S3, that means you only need to pay for the code data storage cost, which is super cheap. In the cloud environment, compute is more expensive,” Huang said. 

Latency remains S3’s weak spot, but TiDB has addressed this through a custom-built caching layer. 

Huang explained that TiDB stores frequently accessed data in memory or local SSDs for speed. Less critical data, forming the storage file, is kept in S3 due to its lower cost. He emphasised that S3 is not part of TiDB’s essential write process to ensure the speed of database access is not affected.

Learning from Competitors and Standing Out

Huang was open when discussing competitors, noting that the majority of TiDB’s customers originally used traditional technologies such as Relational Database Service (RDS) or Aurora. 

“The only competitors we can see now are the cloud vendors. But on the other hand, cloud vendors are also our partners,” he highlighted. 

For competitors like Neon and Supabase, he acknowledged learning a great deal from these vendors about user and developer experience, recognising their strong understanding of developers and their development of many user-friendly tools.

“But from the technology or the architecture part of the database kernel. I think we are actually ahead of the game,” Huang added. “I’m quite confident that TiDB Serverless, built on S3, is the state-of-the-art distributed database.”

He pointed out that one might hear about NeonDB or Supabase a lot, but most of these new vendors are just putting Postgres instance on top of S3. “They are still the Postgres. It’s a single node database,” Huang said, explaining that the competitors use S3 as a replacement for a local storage disk.

Huang also stressed TiDB’s scalability edge and said, “Even for the large customers using the TiDB Serverless, we can easily scale out to thousands of nodes immediately. So we don’t have the scalability bottleneck even for the serverless customer.”

On making TiDB easy for developers, Huang pointed to compatibility and openness, saying, “We are MySQL compatible, so if you have experience with MySQL or Postgres, you can very easily use TiDB to support your application.”

 “Most of the time, you don’t even need to change a single line of code,” he added.

Designing for the AI Workload Era

Looking ahead, the company sees AI as the next major inflection point for databases and that the database would increasingly be formed not by applications or data scientists, but by large language models and agents.

To support this, TiDB is working on providing an AI-friendly interface to AI agents and consolidating “different data models into a unified SQL interface.”

For multi-tenant AI workloads, Huang reiterated S3’s role and said, “If you are not using S3, you cannot provide a flexible and cost-efficient solution to the AI applications.”

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page