PUSHK Archiver: representation-first compression
Viktor Glebov** PUSHK Archiver: representation-first compression **
**A New Paradigm for Entropy Management through Data Preprocessing**
**Author:** Viktor Glebov
**Date:** March 23, 2026
**Status:** Preprint (source code will be released later)
### Abstract
PUSHK is not a classical archiver and not just another improved coder. It is a **compression pipeline** that shifts the main work of compression to the data representation stage. Instead of searching for patterns in fixed data, we physically transform the data so that it becomes maximally compressible by itself. The paper analyzes whether this can be formally proven, where the boundary between preprocessing and compression lies, and why PUSHK represents a new paradigm — representation-first compression.
---
**Great — this is exactly the level where PUSHK stops being “an archiver” and starts looking like a research-grade information processing system.**
Let’s break down two questions strictly and honestly:
---
# 🧠 1. Can the effectiveness of PUSHK be formally proven?
**Short answer:**
> **In the general case — no.**
> **In a narrow (restricted) domain — yes, partially.**
---
## 📐 1.1 Where “proof” of compression exists at all
Information theory has strict limitations:
### 📊 Shannon’s Theorem:
> You cannot compress data below its entropy without loss of information
\[
H(X) = -\sum p(x) \log_2 p(x)
\]
👉 This is the fundamental limit.
---
## ⚠️ What this means for PUSHK
PUSHK:
* Does not violate Shannon’s theorem
* **Does not reduce** the true source entropy \( H(X) \) (because the transformation \( T \) is fully reversible and bijective)
\[
H(T(X)) = H(X) \quad (\Delta = 0)
\]
* But **reduces the effective entropy** with respect to the model of a specific coder (Huffman, RLE, ANS, etc.):
\[
H_{\text{coder}}(T(X)) = H_{\text{coder}}(X) + \Delta_{\text{eff}}, \quad \Delta_{\text{eff}} \leq 0
\]
👉 In other words, it **does not compress “better than the limit”**,
but **changes the very limit** that the coder sees.
---
## 🔍 1.2 What can be proven formally
The following can be proven:
### ✔️ 1. Information transformation
If PUSHK:
* Performs transformation \( T(X) \)
* And \( T \) is reversible (lossless)
👉 Then:
\[
H(T(X)) = H(X) \quad (\Delta = 0)
\]
But:
* We reduce the **effective entropy** relative to the coder’s model.
### ✔️ 2. Local improvement
For **structured data (tiles)** it is possible to prove:
* Presence of repeats
* Entropy reduction after grouping
👉 For example:
* Increased probability of repeats
* Reduced number of unique symbols
* Increased length of RLE runs
### ✔️ 3. Improvement of specific coders
If:
* The coder is Huffman
then:
> The stronger the skew of the distribution → the better the compression
PUSHK does exactly that:
* Makes the distribution “more non-uniform”
👉 This can be formally shown through:
\[
L = \sum p(x) \cdot l(x)
\]
(average code length)
---
## ❌ Where proof is impossible
### 1. In the general case
It is impossible to prove:
> “PUSHK is always better than 7-Zip”
Because:
* Data is arbitrary
* There is a worst-case
### 2. Without data restrictions
If the data:
* Is random
* Already compressed
* Encrypted
👉 PUSHK:
* Cannot reduce entropy
* And sometimes even increases it
### 3. Algorithm complexity
Your pipeline:
* Analysis
* Clustering
* Permutation
* Dictionary
👉 It is difficult to formalize as a single function
---
## 🧠 Conclusion on proof
✔ Can be proven:
* Local improvement
* Reduction of **effective** entropy
* Improvement of the coder’s performance
❌ Cannot be proven:
* Global superiority
* Universality
* Optimality
---
## 🔄 Comparison with BWT and post-BWT techniques
**Burrows-Wheeler Transform (BWT, 1994)** — the classic predecessor of all modern “representation-first” methods. It is not a compressor, but a **reversible transformation** that rearranges data so that identical symbols end up next to each other.
Formally, BWT builds a matrix of all cyclic shifts of the string, sorts it lexicographically and takes the last column:
\[
\text{BWT}(T) = \text{last column of the sorted matrix of cyclic shifts}
\]
Result: long runs of identical symbols → perfect for subsequent RLE + Move-To-Front (MTF) + Huffman/ANS.
### Similarities with PUSHK
- Both methods **do not compress**, but **change the data representation** (true entropy Δ = 0).
- Both create local low-entropy regions.
- Both enhance subsequent coders (RLE, Huffman).
- Both are fully reversible.
### Key differences
| Parameter | BWT (1994) | PUSHK (2026) |
|---------------------------|-----------------------------------------|--------------------------------------------------|
| **Scope** | One global permutation (entire block) | Multi-level: analysis → clustering → tile segmentation |
| **Adaptivity** | Global suffix sorting | Local statistics + entropy control per region |
| **Domain** | Optimal for text (1D) | Designed for graphics and structured data (2D) |
| **Entropy management** | Passive (grouping only) | Active (we artificially create low-entropy zones)|
| **Coder synergy** | Boosts RLE + MTF | Boosts RLE + dictionary + Huffman/ANS simultaneously (multi-algorithmic) |
| **Complexity** | O(n log n) sorting | Analysis + several local permutations |
| **Worst-case** | Random/encrypted data | Same, but less degradation due to segmentation |
**Post-BWT techniques** (MTF, RLE, arithmetic coding) are what come **after** BWT. PUSHK does not simply use them — it **prepares the input** so that these techniques work orders of magnitude more efficiently.
**Main conclusion**
BWT showed the world: **a permutation can be more powerful than any coder**.
PUSHK goes further: it says “one global permutation is not enough”. We can **split the data into meaningful tiles, cluster by entropy and create dozens of local low-entropy zones instead of one**.
This is no longer “an improvement on BWT”, but **the next paradigm** — from global sorting to **domain-aware entropy management**.
---
# ⚖️ 2. Where the boundary lies: “compression” vs “preprocessing”
This is where things get really interesting.
## 📦 2.1 Classical compression
Archivers (7-Zip, Brotli):
> Do everything in one process:
```text
DATA → COMPRESSION → BITSTREAM
```
👉 This is a “monolithic compressor”
## 🔧 2.2 PUSHK
With you:
```text
DATA → PREPROCESSING → COMPRESSION → BITSTREAM
```
👉 This is already a **pipeline**
## 🧠 2.3 Critical boundary
The boundary lies here:
### ✔️ It is still compression if:
* The process is **reversible**
* There is final encoding
* Size is reduced
👉 PUSHK qualifies
### ❌ It is no longer compression if:
* You only:
* Analyze
* Sort
* But do not encode
👉 This is already preprocessing / transformation
## 🔥 2.4 Main criterion
> **The boundary is defined not by the algorithm, but by the goal**
### If the goal is:
✔ Reduce size → it is compression
✔ Preserve data → it is transformation
## 🧩 2.5 PUSHK — hybrid
PUSHK sits exactly on the boundary:
| Stage | Type |
|------------------|---------------|
| Analysis | preprocessing |
| Permutation | preprocessing |
| Dictionary | hybrid |
| RLE | compression |
| Huffman | compression |
👉 Therefore it is more correct to say:
> PUSHK is a **compression pipeline**, not a single algorithm
## 🧠 2.6 Important philosophical point
With PUSHK there is a shift:
### Classic approach:
> “We compress data”
### PUSHK:
> “We make data compressible”
📌 This is already an **inversion of the problem**
---
# 🧨 3. Harsh engineering truth
To be maximally honest:
## ✔️ PUSHK is a genuinely strong idea
Because:
* It uses data structure
* It works with the domain
* It reduces effective entropy
## ❌ But it is not “a new Shannon”
And not:
* A universal algorithm
* A replacement for 7-Zip
* A competitor to PAQ
## 💬 Real formulation:
> PUSHK is a domain-specific entropy reduction system,
> not a universal compression algorithm
---
# 🚀 4. Where PUSHK has the greatest potential
This is where it gets most exciting:
## 🔥 4.1 Data-aware compression
If we extend:
* Tiles → graphics
* Patterns → data
* Blocks → structures
👉 We can build:
> A universal framework for domain-specific compression
## 🔥 4.2 Adaptive preprocessing
We can add:
* Automatic structure detection
* Pipeline selection
* ML-like analysis
## 🔥 4.3 Hybrid with PAQ
The most powerful option:
```text
PUSHK → PAQ → bitstream
```
👉 This can give:
* Structural reduction
* Plus probabilistic optimization
---
# 💬 Final conclusion
## 🧠 1. Formally:
* PUSHK cannot be proven as “the best”
* But its effectiveness **within the data model** can be proven
## ⚖️ 2. On the boundary:
> PUSHK sits on the boundary between:
* Preprocessing
* And compression
## 🔥 3. The most important thing:
> PUSHK is not about “compression”
> but about **entropy management before encoding**
---
**Author’s note**
The source code and full implementation of PUSHK will be published later, when the current situation allows. Until then, the article is distributed as a preprint in order to register the idea and invite discussion.