Measuring energy usage: regular code vs. SIMD code

Measuring energy usage: regular code vs. SIMD code

Daniel Lemire's blog

Modern processor have fancy instructions that can do many operations at one using wide registers: SIMD instructions. Intel and AMD have 512-bit registers and associated instructions under AVX-512.

You expect these instructions to use more power, more energy. However, they get the job done faster. Do you save energy overall? You should expect so.

Let us consider an example. I can just sum all values in a large array.

float sum(float *data, size_t N) {
  double counter = 0;
  for (size_t i = 0; i < N; i++) {
    counter += data[i];
  }
  return counter;
}

If I leave it as is, the compiler might be tempted to optimize too much, but I can instruct it to avoid ‘autovectorization’: it will not doing anything fancy.

I can write the equivalent function using AVX-512 intrinsic functions. The details do not matter too much, just trust me that it is expected to be faster for sufficiently long inputs.

float sum(float *data, size_t N) {
  __m512d counter = _mm512_setzero_pd();
  for (size_t i = 0; i < N; i += 16) {
    __m512 v = _mm512_loadu_ps((__m512 *)&data[i]);
    __m512d part1 = _mm512_cvtps_pd(_mm512_extractf32x8_ps(v, 0));
    __m512d part2 = _mm512_cvtps_pd(_mm512_extractf32x8_ps(v, 1));
    counter = _mm512_add_pd(counter, part1);
    counter = _mm512_add_pd(counter, part2);
  }
  double sum = _mm512_reduce_add_pd(counter);
  for (size_t i = N / 16 * 16; i < N; i++) {
    sum += data[i];
  }
  return sum;
}

Under Linux, we can ask the kernel about power usage. You can query the power usage of different components, but I query the overall power usage. This includes, among other things, the power usage of the memory system. It works well with Intel processors as long as you have privileged access on the system. I wrote a little benchmark that runs both functions.

On a 32-core Ice Lake processors, my results are as follows:

So the AVX-512 uses 3.5 times less energy overall, despite consuming 10% more energy per unit of time.

My benchmark is naive and should only serve as an illustration. The general principle holds, however: if your tasks complete much faster, you are likely to use less power, even if you are using more energy per unit of time.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page