Predictable memory accesses are much faster

Predictable memory accesses are much faster

Daniel Lemire's blog

Loading data from memory often takes several nanoseconds. While the processor waits for the data, it may be forced to wait without performing useful work. Hardware prefetchers in modern processors anticipate memory accesses by loading data into the cache before it is requested, thereby optimizing performance. Their effectiveness varies depending on the access pattern: sequential reads benefit from efficient prefetching, unlike random accesses.

To test the impact of prefetchers, I wrote a Go program that uses a single array access function. The execution time is measured to compare performance. I start with a large array of 32-bit integers (64 MiB).

  1. Sequential access: I read every 8 integers.
  2. Random access: I read every 8 integers in random order.
  3. Backward access: I read every 8 integers from the end.
  4. Interleaved access: I read every 8 integers, starting from the first, the third, the second, and so forth.
  5. Bouncing access: I read every 8 integers, starting from the first, then the last, then the second, then the second last and so forth.

I skip integers that are not at an index divisible by eight: I do so to minimize ‘cache line’ effects.

Running the program on my Apple laptop, I get that everything is much faster than the pure random access. It serves to illustrate how good our processors are at predicting data access.

My Go program is available.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page