Why is attribute noinline ignored by gcc-15.1.0 in this example?

Question

Looking at this benchmark about a custom std::function implementation: https://github.com/PacktPublishing/Hands-On-Design-Patterns-with-CPP-Second-Edition/blob/main/Chapter06/09_function.C

I tried to replicate the example and I noticed that despite declaring this simple function like this: __attribute__((noinline)) auto function_no_inline(int a, int b, int c, int d) -> int { return a + b + c + d; }, the time it took was the same as the inline function, while it was much more if function was actually defined in a different compilation unit. It seems that the attribute was ignored for some reason. Why? Arguments are obtained from rand().

Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_invoke_function                 1.35 ns         1.35 ns    504544141
BM_invoke_function_no_inline      0.271 ns        0.271 ns   2584830443
BM_invoke_function_inline         0.270 ns        0.270 ns   2580073503
BM_invoke_std_function             2.21 ns         2.17 ns    324669753

This is my code. It links against the google-benchmark library


    #include <benchmark/benchmark.h>
    
    #include <functional>
    
    auto function(int a, int b, int c, int d) -> int;
    
    __attribute__((noinline)) auto function_no_inline(int a, int b, int c, int d) -> int { return a + b + c + d; }
    
    inline auto function_inline(int a, int b, int c, int d) { return a + b + c + d; }
    
    template <typename Callable>
    auto invoke(int a, int b, int c, int d, const Callable& callable)
    {
      return callable(a, b, c, d);
    }
    
    // Benchmarks
    void BM_invoke_function(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_function_no_inline(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function_no_inline));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_function_inline(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function_inline));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_std_function(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      std::function<int(int, int, int, int)> std_function{function};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, std_function));
        benchmark::ClobberMemory();
      }
    }
    
    BENCHMARK(BM_invoke_function);
    BENCHMARK(BM_invoke_function_no_inline);
    BENCHMARK(BM_invoke_function_inline);
    BENCHMARK(BM_invoke_std_function);
    
    BENCHMARK_MAIN();

One heuristic that good optimising compilers often use is if the instruction length for the inline code of a function is shorter than the code to put all the parameters onto the stack and then call the function then it will choose the faster shorter option. This is probably the case for your example. You have to be careful benchmarking. — Martin Brown
– Martin Brown, Commented Nov 1 at 9:46
Reading gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html, it seems the function is not considered for inline, but is optimized away for some other reason. Adding an asm("") instruction while using __attribute__((noinline)) finally made it not inlined, although hopefully did not have other effects. — luczzz
– luczzz, Commented Nov 1 at 9:59
Simple your code is not bottlenecked by the function call. Also note that the compilers are free to optimize according to the "as-if-rule" so even if you don't explicitly state a function must be inlined the compiler can still do it for you (if the function is simple enough). — Pepijn Kramer
– Pepijn Kramer, Commented Nov 1 at 10:45
The point I am making is that I am using an explicit compiler attribute, to NOT inline. I am fine with the compiler inlining things when there is no attribute or inline keyword, but it's ignoring its own attributes here. — luczzz
– luczzz, Commented Nov 1 at 10:49
With -O3 gcc can still optimize it away (which kind of makes sense since you are giving contradictory input), so what are your compiler settings? — Pepijn Kramer
– Pepijn Kramer, Commented Nov 1 at 13:37

OLEGSHA · Accepted Answer · 2025-11-01 17:05:15Z

3

I popped your example into Compiler Explorer (link) and I see that function_inline is inlined, but function_no_inline is indeed not:

BM_invoke_function_inline(benchmark::State&):
        push    r15
        push    r14
[...]
        lea     edx, [r14+r15]
        add     edx, ebp
        add     edx, DWORD PTR [rsp+12]

BM_invoke_function_no_inline(benchmark::State&):
        push    r15
        push    r14
[...]
        call    function_no_inline(int, int, int, int)

I'm not sure if I guessed your compilation setup correctly (e.g. -std=c++23 -O3), but either I can't reproduce your results, or the explanation does not involve noinline.

That said, noinline is kind of outdated: it prevents inlining, but it does not prevent several other kinds of optimizations that could be affecting your situation (though apparently not if we trust my Compiler Explorer results.) The more bulletproof method is to use noipa to explicitly ask GCC to treat the function as a standalone unit. It includes noinline and any other dark magic.

From GCC function attribute docs:

noinline

This function attribute prevents a function from being considered for inlining. It also disables some other interprocedural optimizations; it’s preferable to use the more comprehensive noipa attribute instead if that is your goal.

Even if a function is declared with the noinline attribute, there are optimizations other than inlining that can cause calls to be optimized away if it does not have side effects, although the function call is live. To keep such calls from being optimized away, put
asm ("");

noipa

Disable interprocedural optimizations between the function with this attribute and its callers, as if the body of the function is not available when optimizing callers and the callers are unavailable when optimizing the body. [...]

answered Nov 1 at 17:05

OLEGSHA

7287 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

luczzz Nov 1 at 20:19

I get the same timings as the function in the separate TU if I add that asm("") in addition to __attribute__((noinline)) (which has to be there else, even with asm(""), the function is inlined). noipa also prevents inlining. I guess the take away is to never trust compiler attributes without reading the documentation.

Collectives™ on Stack Overflow

Why is attribute noinline ignored by gcc-15.1.0 in this example?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related