ARM instructions do “less work”?
Daniel Lemire's blogModern processors can execute several instructions per cycle. Because processors cannot easily run faster (in terms of clock speed), vendors try to get their processors to do more work per cycle.
Apple processors are wide in the sense that they can retire many more instructions per cycle than comparable Intel or AMD processors. However, some people argue that it is unfair because ARM instructions are less powerful and do less work than x64 (Intel/AMD) instructions so that we have performance parity.
Let us verify.
I have a number parsing benchmark that records the number of cycles, instructions and nanosecond spent parsing numbers on average. I parse a standard dataset of numbers (canada.txt), I keep the fast_float numbers (ASCII mode).

Of course, that’s a single task, but number parsing is fairly generic as a computing task.
Though your mileage will vary, I find that for the tasks that I benchmark, I often see as many ARM instructions being retired than x64 instructions. There are differences, but they are small.
However, Apple processors definitively retire more instructions per cycle than Intel processors.
Generated by RSStT. The copyright belongs to the original author.