Report on June 1, 2025 Operation Incident
CoreBrief:
* On June 1, 2025 from 12:15 UTC to 13:22 UTC, block production was suspended on the TON main network masterchain.
* The suspension was caused by a bug in the TON validator software that was accidentally introduced with the “Dispatch Queue” update in August 2024, but was not noticed during testing by the team, on the testnet, or on the mainnet until now.
* Monitoring systems quickly reported the issue, the team found the problem and released a patch that fully fixes it.
* Only a few validators needed to be updated for block production to resume. This is because the bug was in the block creation phase, not the verification phase, and the validators creating the block are constantly rotating. The team contacted several well-known network validators, such as Tonwhales, Tonstakers, Unit 410 and Twinstake.
* After they applied the patch, block production resumed and all pending transactions were processed. No messages, TONs, jettons or other assets were lost.
* We will merge the patch into the master branch on the week, but other validators do not need to update immediately and can wait for the next scheduled node update.
* To prevent such issues from occurring in the future, we have made some additions to our node update testing process, specifically several new masterchain test-cases.
Introduction
During masterchain block creation, collators process messages through a sequence of steps, including moving messages from the Dispatch Queue to the OutMsgQueue early in block generation. In particular this dispatch‐queue handling move max_lt of the block, while some governance contracts expected to be executed first and thus rely on exact value of max_lt corresponding on first tx in block. This behavior in collators (block creation part of the node) lead to attempt of generation of a block containing a tick transaction with an excessively high lt. As a result, new masterchain block proposals were rejected, and block generation came to a halt for the affected period.
Chronology
- June 1, 12:15:31 UTC
Collators began constructing a next masterchain block after
(-1,8000000000000000,48400109). During the dispatch‐queue processing phase, messages were moved from the Dispatch Queue to the OutMsgQueue. Due to the bug, a tick transaction’s current_lt was incorrectly advanced beyond the block’s start lt, causing collators to produce an invalid block candidate. All proposals were rejected by the network, and masterchain block production stalled.
- June 1, 12:20 UTC
The on‐call investigation group was assembled. Initial logs pointed to anomalies in the dispatch‐queue handler, specifically around how lt values were updated when relocating messages between queues.
- June 1, 12:25 UTC
The operations team began contacting a few known validator operators to prepare for an expedited rollout of any required fix.
- June 1, 12:40 UTC
Team traced the root cause to the dispatch‐queue routine: moving messages into the OutMsgQueue updated the max_lt of pending tick transactions to a timestamp later than the actual block start. Recognizing this logic oversight, they formulated a patch to bind each tick transaction’s lt strictly to the block’s opening start_lt+1, preventing any unintended increments during queue processing.
- June 1, 12:52 UTC
The proposed patch—which enforces that tick transactions carry the block’s start lt regardless of queue movements—was tested in a private network under simulated load. The fix proved effective: collators produced valid block candidates that passed validation.
- June 1, 13:22 UTC
With a subset of validators updated, the next correct masterchain block
(-1,8000000000000000,48400110) since the incident was created and signed. Although the number of patched validators remained modest, this small fraction sufficed to restore continuous block production, as the protocol only requires a minimum quorum of valid proposals.
- June 1, 13:48 UTC
A secondary “aftershock” occurred when some collators still running the unpatched code attempted to produce blocks, again generating invalid candidates. This caused a brief 3-minute pause in block production. By this time, more validators had applied the fix, and the proportion of patched nodes rose above the protocol’s threshold.
- June 1, 13:51 UTC onward
No further significant interruptions occurred. All incoming and outgoing messages aligned correctly with lt ordering, and masterchain blocks resumed at normal cadence.
Organizational conclusions: Introduce to release testing process an expanded stress tests for masterchain that not only repeat simulation high loads of different transaction/message types characteristic to basechain load, but also include governance transactions (elections, config changes) in parallel to user traffic.
P.S.
We are grateful to our colleagues at Tonwhales, Tonstakers, Twinstake, Unit 410 and all validator teams who worked with us on Sunday to coordinate patch deployment. Their rapid response and collaboration were instrumental in fast restoring masterchain block production with minimal downtime.