Processors are getting wider
Daniel Lemire's blog
Our processors execute instructions based on a clock. Thus, a 4 GHz processor has 4 billion cycles per second. It is difficult to increase the clock frequency of our processors. If you go much beyond 5 GHz, your processor is likely to overheat or otherwise fail.
So, how do we go faster? Modern processors can execute multiple instructions simultaneously. It is sometimes called superscalar execution. Most processors can handle 4 instructions per cycle or more. A recent Apple processor can easily sustain over 8 instructions per cycle.
However, the number of instructions executed depends on the instructions themselves. There are inexpensive instructions (like additions) and more costly ones (like integer division). The less costly the instructions, the more can be executed per cycle.
A processor has several execution units. With four execution units capable of performing additions, you might execute 4 additions per cycle. More execution units allow more instructions per cycle.
Typically, x86-64 processors (Intel/AMD) can retire at most one multiplication instruction per cycle, making multiplications relatively expensive compared to additions. In contrast, recent Apple processors can retire two multiplications per cycle.
The latest AMD processors (Zen 5) have three execution units capable of performing multiplications, potentially allowing 3 multiplications per cycle in some cases. Based solely on execution units, a Zen 5 processor could theoretically retire 3 additions and 3 multiplications per cycle.
But that is not all. I only counted conventional multiplications on general-purpose 64-bit registers. The Zen 5 has four execution units for 512-bit registers, two of which can perform multiplications. These 512-bit registers allow us to do many multiplications at once, by packing several values in each register.
Generally our general-purpose processors are getting wider: they can retire more instructions per cycle. That is not the only possible design. Indeed, these wider processors require many more transistors. Instead, you could use these transistors to build more processors. And that is what many people expected: they expected that our computers would contain many more general-purpose processors.
A processor design like the AMD Zen 5 is truly remarkable. It is not simply a matter of adding execution units. You have to bring the data to these units, you have to order the computation, handle the branches.
What it means for the programmers is that even when you do not use parallelism explicitly, your code executes in a parallel manner under the hood no matter what.
Generated by RSStT. The copyright belongs to the original author.