DPDP Act Exposes Shadow Data in India IT

DPDP Act Exposes Shadow Data in India IT

Analytics India Magazine (C P Balasubramanyam)

At 11 pm on a weekday, a developer working on a global banking project uploads a screenshot to a Jira ticket to explain a bug. The image captures more than the error message; it also contains the customer name, account number, and transaction details. The ticket is shared, commented on, exported, archived and eventually forgotten. 

Under India’s Digital Personal Data Protection (DPDP) framework, as clarified by the newly notified 2025 Rules, that single screenshot may now represent a clearer compliance failure.

As the DPDP Act moves closer to operational enforcement, this is the kind of scenario troubling privacy experts across India’s IT services and global capability centre (GCC) ecosystem. 

Founders and executives in the data privacy space are issuing a stark warning—the greatest compliance risk does not lie in production systems but in the vast, loosely governed universe of shadow data being created during software delivery.

Shadow data is created, stored, or shared without being formally managed or governed by IT firms.

Keshava Murthy, CEO and co-founder of AI security engineer platform Matters.AI, argued that Indian IT services firms are structurally ill-equipped to meet DPDP’s demands. 

Unlike regulated sectors such as banking or healthcare, which have spent years building tightly controlled, data-minimised environments, IT services evolved around labour arbitrage, speed and reuse. Multi-tenant delivery centres blurred accountability between client and vendor, and shared tooling was optimised for efficiency, not for purpose limitation or erasure. 

“DPDP demands data for clarity, classification, minimisation, and erasure. IT services firms were never architected for that,” Murthy told AIM.

The Rules

The recently notified Digital Personal Data Protection Rules, 2025, make this architectural gap harder to ignore. They extend obligations well beyond production systems to every environment where personal data may surface. 

Rule 6 mandates safeguards such as encryption, masking, obfuscation and virtual tokens, along with strict access controls and audit logs. 

Rule 8 reinforces mandatory erasure of personal data once the purpose is served and imposes defined retention and deletion timelines for logs. 

The Second Schedule restricts processing personal data strictly for stated purposes only, i.e. purpose limitation. 

Together, these provisions pull test environments, screenshots, tickets, DevOps logs and AI workflows firmly into the DPDP compliance perimeter.

The DPDP Act is triggering a similar compliance shift for IT services firms as the European Union did years ago.

In the EU, the General Data Protection Regulation (GDPR) forced organisations to move from policy-led compliance to privacy-by-design architectures, where personal data is controlled at the system level. The rules require technical and organisational measures such as data minimisation, pseudonymisation, strict access controls, and lifecycle-based retention and erasure across all environments where personal data appears, not only production systems. 

Regulators expect demonstrable proof of data flows, purpose limitation and deletion, rather than contractual assurances alone.

The Challenges

Roshmik Saha, co-founder and CTO of security and data privacy platform Skyflow, said the compliance risk is greater in test and QA environments, which, unlike production systems, aren’t typically encrypted, access-controlled and monitored. 

“Production is usually the safest place,” he says. “The real danger is outside it.”

Developers routinely clone production data in test environments for convenience. Under DPDP, each clone is treated as a fresh instance of personal data with its own compliance obligations. 

Skyflow addresses this by replacing real personal data in non-production systems with format-preserving tokens generated through polymorphic encryption. These tokens carry no intrinsic value, with real data confined to a central privacy vault. 

“When a Right to Erasure request arrives, there’s nothing to chase: delete the key in the central vault, and every token across all environments becomes instantly meaningless,” Saha explained. 

Section 12 of the DPDP Act provides for the right to correction and erasure of personal data.

Vaibhav Sharma, COO of data privacy management platform Redacto.ai, expects the same problem to manifest in day-to-day operations. 

“A single Jira ticket with a full customer screenshot often violates DPDP more clearly than a hardened production table,” he added.

Murthy referred to these environments collectively as “delivery shadows”—test sandboxes, developer laptops, email attachments, Excel reconciliations, DevOps logs, and artefact stores that evolve daily without central oversight. 

As they are unstructured and constantly changing, traditional data security posture management and data loss prevention tools struggle to classify them at the semantic level that DPDP demands. 

“You can’t erase what you can’t even enumerate,” he said, explaining why purpose limitation and erasure routinely fail in multi-tenant delivery models.

All three executives agreed that manual processes cannot keep pace with these realities. 

Sharma argued that data minimisation and purpose limitation become operational only when redaction is automated at the point of creation. Detecting and masking personal data in documents, screenshots and tickets before they circulate ensures that excessive data never enters the system. 

“Automated redaction makes these principles operational, not aspirational,” he said, adding that it also creates defensible evidence for regulators.

AI compounds challenges 

Saha warned that large language models do not merely store personal data; they internalise it. 

Once identifiers enter training data, vector stores or embeddings, selective deletion becomes nearly impossible. “LLMs don’t just store data, they learn from it,” he said, emphasising that making identity-aware tokenisation at ingestion essential. 

Sharma observed that as AI copilots become standard, personal data increasingly appears in prompts, outputs and summaries that bypass governance entirely. 

Banning AI tools, he argued, is impractical. Embedding automated redaction directly into AI workflows is the only viable control.

For Murthy, these pressures signal a deeper shift. DPDP is forcing IT services firms away from policy-led compliance toward architecture-led control. Meanwhile, regulators are no longer satisfied with contracts or guidelines; they expect demonstrable answers. 

“Show me where the data lives, show me who accessed it, show me the purpose mapping, and show me proof of erasure… No policy can answer that,” he asserted.

However, regulators are unlikely to view such lapses as technical oversights.

The risk of inaction, Murthy warned, is existential. 

With DPDP penalties running up to ₹250 crore per violation and global clients tightening expectations, the traditional multi-tenant delivery model could face sustained scrutiny unless it is rebuilt with purpose-level controls and semantic tagging. 

In an AI-driven delivery environment, a single incident in one delivery centre could cascade across multiple clients. 

“DPDP is not a compliance checkbox; it is a design test,” he said.

DPDP demands a fundamental rethink of how personal data is handled across India’s software delivery pipelines. Only those firms that redesign their data architecture, not just their documentation, will be able to meet regulatory expectations while preserving delivery velocity and client trust.

The post DPDP Act Exposes Shadow Data in India IT appeared first on Analytics India Magazine.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page