The Need For The 10× IT Admin
Analytics India Magazine (Supreeth Koundinya)

When a backup becomes unreadable, a storage array corrupts, or a hybrid-cloud restore freezes mid-sequence, it isn’t the feature teams who decide whether a business stays operational. It’s the IT administrators. Yet, the past decade of enterprise software has primarily fixated on developer productivity.
The industry spent years building tools to ship code faster, while the people responsible for keeping systems alive were buried under manual recovery work. That imbalance is now being questioned by a growing number of IT experts today.
Tejas Pandit, co-founder of cyber-resilience startup MeshDefend, said in an interaction with AIM, “Everybody is talking about the 10x engineer today. Nobody is talking about the 10x admin yet.”
He has seen first-hand how fragile recovery workflows inside large enterprises can be. Rebuilding broken infrastructure often drags on for weeks.
“It takes 24 to 30 days to rebuild anything if it breaks. And those first 48 to 72 hours are very, very rough on the IT managers.” In those early days, teams scramble through dense PDFs, snapshot spreadsheets and outdated runbooks. “I have seen people shakingly navigate through 200, 500, 700-page PDFs,” he said.
A report from Unitrends this year, which surveyed over 3,000 IT experts worldwide, states that over half of organisations spend more than two hours per day on backup monitoring, troubleshooting or fixes, turning resilience work into a significant time sink.
In terms of disaster recovery, more than 60% of companies believe they can restore operations in under a day. In reality, however, only 35% actually manage to.
Meanwhile, attackers have been evolving faster than the runbooks meant to stop them.
Pandit explained that adversaries increasingly target backup systems before touching production, knowing that an organisation stripped of recovery insurance is far more likely to pay ransom.
Before teams can even begin restoring systems, they must verify what data can be trusted, turning recovery from a straightforward technical procedure into a sequence of high-stress decisions.
From Observability to Execution
The observability market has largely solved the visibility problem. Tools from Datadog, Dynatrace, New Relic, Splunk and others give organisations deep insight into logs, metrics and traces. But insight alone does not reduce downtime.
Patrick Lin, senior vice president of observability at Splunk, summed up the challenge in an earlier discussion with AIM. “You still have to get the right data in [observability platforms], you still have to have a certain amount of operational rigour… More information means either fewer outages or shorter duration outages.”
Information accelerates detection. But someone still has to decide what action is safe and execute it. That execution lag is where vendors are now applying AI.
The lack of readiness shows up in more minor details.
The Unitrends report found that just 15% of organisations test backups daily, and only 11% run daily disaster recovery tests, meaning most rely on unverified assumptions.
Worse still, one in five wouldn’t realise they had missed backups until a failure occurred, and 10% acknowledge they would not be notified at all.
In 2025, companies such as Veeam, NetApp, Rubrik and Commvault began adding features that help administrators act rather than just observe.
Veeam introduced secure AI access to backup data without expanding breach risk. NetApp embedded breach detection directly into enterprise storage, using AI to identify compromised snapshots and invoke isolated recovery.
Rubrik moved beyond protection to decision-assisted restoration for Microsoft 365 and DevOps environments.
And Commvault integrated with platforms like CrowdStrike, Microsoft and Palo Alto Networks to tie threat signals directly to automated recovery, allowing security posture and recovery posture to inform one another rather than operate in isolation.
What unifies these moves is a shift from dashboards to execution. AI is not being deployed to describe what went wrong, but to shorten the steps between detection and safe restoration.
That direction aligns with a broader finding from Commvault’s latest readiness study, which notes that enterprises focus heavily on deploying AI systems but rarely ensure that those systems themselves can be protected or recovered.
As the report puts it, “Few organisations plan comprehensive resilience for AI-specific assets… These critical business assets require enterprise-grade protection but are often treated as disposable.” That oversight extends the same pattern: strong visibility, weak execution.
Commvault’s report proposes a phased approach that mirrors how recovery workflows are being redesigned across the industry. The first 90 days are dedicated to understanding which infrastructure and AI assets need protection, documenting their provenance and enforcing trusted access boundaries.
The next 90 days shift to automating safe recovery steps for those same assets, including models, training data and vector stores, which are increasingly as critical as databases.
Adding a sober warning that applies as much to backup infrastructure as to AI workloads, the company stated that “validating that the underpinning of the AI stack is built on security and resilience is critical to the success of AI initiatives.”
Encoding Operational Judgment
The growing automation trend still leaves a gap. No matter how advanced a model may be, it cannot recover a system without understanding how the organisation itself approaches recovery.
Pandit argues that automating this work is not simply an algorithmic problem but a knowledge-transfer problem. Enterprise recovery is shaped by policy, risk appetite and years of human decision-making.
“In the AI native solution, you are combining a system of records with the workflows plus humans,” he said.
Pandit co-founded MeshDefend with Ravi Chitloor after spending nearly two decades at Dell EMC in enterprise backup and cyber recovery. The startup recently raised $2.5 million in a pre-seed round led by Kalaari Capital.
Their focus is not on another monitoring layer, but on an execution layer that works across a company’s vendors without replacing them. Pandit said they deliberately decided to avoid the crowded observability market. “We don’t want just that visibility, but we want the agency to act.”
The company embeds engineers directly inside customer environments to codify how recovery is actually performed, turning bespoke human routines into repeatable, auditable steps inside its Agent Mesh operating system.
This can manage a distributed network of AI agents that continuously monitor, validate and coordinate data infrastructure operations with enterprise-grade scale.
This approach exposes why recovery gains vary so widely. MeshDefend pilots have shown improvements of 5% to 35% in operational efficiency. The variation, as per Pandit, does not stem from AI but from how well human practice gets encoded.
The more an organisation has documented and rational workflows, the more automation compounds their effectiveness. The messier the environment, the more work is required up front to capture its logic.
The company is starting with data protection, backup and storage, but its architecture is designed to expand across the whole infrastructure stack.
Because it is built AI-native, vendor integrations can be completed in weeks rather than quarters, placing a premium on governance and auditability. With AI, the purpose is not to remove judgment, but to remove the repetitive decision load that prevents judgment from being applied, Pandit explained.
Pointing towards where operators spend their energy, he stated, “They’re doing almost anywhere between 30-60% of their tasks which are repetitive today.”
“They would love their IT admins to really go and work on the higher value tasks.”
A system administrator at a startup, who requested anonymity, told AIM that the most draining work isn’t handling major outages, but the slow repetition of small recovery tasks, such as digging through user data backups and coaching remote employees through troubleshooting their own machines.
If automation can relieve the cognitive burden of recovery, then the measure of effectiveness will be how reliably infrastructure stands back up when it matters most.
The post The Need For The 10× IT Admin appeared first on Analytics India Magazine.
Generated by RSStT. The copyright belongs to the original author.