12

This simple C code first creates an array of 0xFFFFFF elements and then passes it two times, measuring how much time each pass takes:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define TESTSIZ 0xffffff

char testcases[TESTSIZ];

void gentestcases(void)
{
        size_t i = 0;
        while(i < TESTSIZ)
                testcases[i++] = rand()%128;

        return;
}

long long time_elapsed(struct timespec beg, struct timespec end)
{
        if(end.tv_nsec < beg.tv_nsec) {
                end.tv_nsec += 1000000000;
                end.tv_sec--;
        }

        return 1000000000ll*(end.tv_sec-beg.tv_sec) + end.tv_nsec-beg.tv_nsec;
}

long long test( int(*func)(int) )
{
        struct timespec beg, end;

        clock_gettime(CLOCK_MONOTONIC, &beg);

        int volatile sink;
        size_t i = 0;
        while(i < TESTSIZ)
                sink = islower(testcases[i++]);

        clock_gettime(CLOCK_MONOTONIC, &end);

        return time_elapsed(beg, end);
}

int main()
{
        gentestcases();

        struct timespec beg, end;

        printf("1st pass took %lld nsecs\n", test(islower));
        printf("2nd pass took %lld nsecs\n", test(islower));
}

I compile it with gcc -O2 -std=gnu89 -o sb sillybench.c

Usually I get the result that processing the array for the second time is slower. The effect is small but noticeable (1-3 ms) and - with a single exception - repetitive:

m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13098789 nsecs
2nd pass took 13114677 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13052105 nsecs
2nd pass took 13134187 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13118069 nsecs
2nd pass took 13074199 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13038579 nsecs
2nd pass took 13079995 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13070334 nsecs
2nd pass took 13324378 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13031000 nsecs
2nd pass took 13167349 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13019961 nsecs
2nd pass took 13310211 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13041332 nsecs
2nd pass took 13311737 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13030913 nsecs
2nd pass took 13177423 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13060570 nsecs
2nd pass took 13387024 nsecs

Why is it the case? If anything, I'd suppose that processing an array for the first time should be slower, not for the second time!

If this matters:

m@m-X555LJ ~/UVA/fastIO $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

System:    Host: m-X555LJ Kernel: 4.4.0-21-generic x86_64 (64 bit gcc: 5.3.1)
           Desktop: Cinnamon 3.0.7 (Gtk 2.24.30) Distro: Linux Mint 18 Sarah

CPU:       Dual core Intel Core i5-5200U (-HT-MCP-) cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 8786
           clock speeds: max: 2700 MHz 1: 2200 MHz 2: 2202 MHz 3: 2200 MHz
           4: 2200 MHz
33
  • 1
    Interesting note, I cut/pasted your code and got your same results. But when I removed the function pointer argument to test, since you aren't using it, I got the results you are expecting, i.e. the second run is faster. Commented Jun 12, 2017 at 15:57
  • 2
    Interesting. I'm testing with Visual studio and it's inlining almost exactly the same assembly for both passes. I would guess that the difference is due to some cache magic. Commented Jun 12, 2017 at 15:58
  • 1
    Is there any difference if you compile with -std=gnu99 or -std=gnu11? Have you tried a third run? Commented Jun 12, 2017 at 16:05
  • 1
    "problem" is certainly not related to rand usage as measure (test function) do not use it... It more like a resolution measuring problem and/or OS paging problem. Commented Jun 12, 2017 at 17:06
  • 4
    1) On a busy system: Scheduling. After the first loop, the process has used up time allotted to it by the OS. It is scheduled less during the execution of the second loop. 2) On an idle system: Thermal effects. During the first loop, the process causes the CPU to heat up, therefore reducing the maximum possible CPU frequency during the second loop. Commented Jun 12, 2017 at 18:25

1 Answer 1

5

This effect is most likely caused by turbo mode (or Intel Turbo Boost Technology). Turbo mode allows a processor core to run at higher than nominal clock frequency. One factor for that is temporal bursts*. Commonly for some fraction of a second, the processor will achieve the highest frequency. It is highly likely that the first loop is running at higher clock frequency than the second.

You can confirm that, by manually setting the nominal frequency (2.20 GHz for your processor), e.g. by using cpufrequtils or cpupower. However, on many systems use intel_pstate, which doesn't allow the userspace governor. Here's how you can disable turbo mode for intel_pstate - or disable intel_pstate all together.

Without turbo mode performance should be uniform.

*: Temperature is another factor, but i do doubt that it plays a role for 10 ms benchmark time. To illustrate, assume the CPU exceeds it's 15 W TDP and uses 20 W: Even a tiny 1 g of copper would only heat up by 0.5 K after 10 ms. I commonly see a short distinct burst (time, tens of milliseconds to few seconds) followed by a slow and steady decline (temperature, tens of seconds to minutes)

Note: gentestcases runs for a significant amount of time (e.g. 240 ms) before the first actual test contributing to the "sprint" of the processor.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.