ARM’s Cortex A72: aarch64 for the Masses
Chips and Cheese (clamchowder)Here, we’ll be taking a look at ARM’s Cortex A72 in a long form review. This article was written mostly to test whether longer, multi-page reviews are feasible with WordPress. The conclusion appears to be “no”, as built-in page navigation is extremely poor. But the testing and work has been done, so enjoy 🙂
Contents:
- Introduction and Overview (this page)
- Frontend: Branch Prediction
- Frontend: Fetch and Decode
- Backend: Out of Order Execution Resources
- Backend: Scheduling and Execution
- Core Memory Subsystem
- System Architecture and Bandwidth
- Final Words
Introduction and Overview
ARM’s Cortex A72 is a 3-wide, speculative, out of order microarchitecture launched in 2016. During its prime, it saw service in several cell phone SoCs:
- Qualcomm’s Snapdragon 650, used in the Oppo R11 and Sony Xpreia X
- Huawei’s Kirin 950, used in the Mate 8
- Mediatek’s Helio X20, used in the Lenovo K8 Note and Chuwi Hi9 Pro
Since then, the core has been superseded several times. But unlike its successors, the Cortex A72 has found widespread use long after its heyday. In 2022, it’s one of the most easily accessible out of order ARM cores for enthusiasts, thanks to the Raspberry Pi 4. Raspberry Pi competitors also tend to feature A72 cores. The Rock Pi 4 has a Rockchip RK3399 SoC, with two A72 and four A53 cores. A72 also has a habit of popping up in small hardware projects. The MNT Reform Layerscape LS1028A SoM uses two A72 cores, and is planned as an upgrade for the MNT Reform laptop. Beyond the enthusiast scene, A72 saw service in AWS’s Graviton. Most recently, Pensando has seen fit to employ four A72 cores in its network processor.
We’ll be using Amazon’s Graviton to evaluate the A72, because it’s available at low cost. And we’ll make occasional comparisons to Qualcomm’s Kryo core, as implemented in the Snapdragon 821. Kryo launched in 2015, putting it relatively close in time to the Cortex A72. Both cores also hit similar clock speeds. Kryo in the Snapdragon 821 runs at up to 2.34 GHz, while Graviton’s Cortex A72 cores run at 2.3 GHz.
Core Overview
In terms of basic throughput, Cortex A72 is superficially similar to Intel’s Pentium III. Both are three-wide cores with two ALU pipes, two AGU pipes, and 64-bit wide FPUs. A72 however should be able to achieve higher performance in practice. It has more reordering capacity than Intel’s old 32-bit core. A72’s all-important branch predictor and caches are also more capable.

Compared to Qualcomm’s 4-wide Kryo, the A72 has similar reordering capacity but less theoretical throughput. Fortunately for ARM, core width and execution resources alone typically have little impact on performance.
Next, we’ll take a look at the Cortex A72’s frontend, starting with the branch predictor.
Generated by RSStT. The copyright belongs to the original author.