Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?

Question

Background

This was inspired by this question/answer and ensuing discussion in the comments: Is the definition of “volatile” this volatile, or is GCC having some standard compliancy problems?. Based on others' and my interpretation of what should happening, as discussed in comments, I've submitted it to GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 Other relevant responses are still welcome.

Also, that thread has since given rise to this question: Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?

Intro

I know volatile isn't what most people think it is and is an implementation-defined nest of vipers. And I certainly don't want to use the below constructs in any real code. That said, I'm totally baffled by what's going on in these examples, so I'd really appreciate any elucidation.

My guess is this is due to either highly nuanced interpretation of the Standard or (more likely?) just corner-cases for the optimiser used. Either way, while more academic than practical, I hope this is deemed valuable to analyse, especially given how typically misunderstood volatile is. Some more data points - or perhaps more likely, points against it - must be good.

Input

Given this code:

#include <cstddef>

void f(void *const p, std::size_t n)
{
    unsigned char *y = static_cast<unsigned char *>(p);
    volatile unsigned char const x = 42;
    // N.B. Yeah, const is weird, but it doesn't change anything

    while (n--) {
        *y++ = x;
    }
}

void g(void *const p, std::size_t n, volatile unsigned char const x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

void h(void *const p, std::size_t n, volatile unsigned char const &x)
{
    unsigned char *y = static_cast<unsigned char *>(p);

    while (n--) {
        *y++ = x;
    }
}

int main(int, char **)
{
    int y[1000];
    f(&y, sizeof y);
    volatile unsigned char const x{99};
    g(&y, sizeof y, x);
    h(&y, sizeof y, x);
}

Output

g++ from gcc (Debian 4.9.2-10) 4.9.2 (Debian stable a.k.a. Jessie) with the command line g++ -std=c++14 -O3 -S test.cpp produces the below ASM for main(). Version Debian 5.4.0-6 (current unstable) produces equivalent code, but I just happened to run the older one first, so here it is:

main:
.LFB3:
    .cfi_startproc

# f()
    movb    $42, -1(%rsp)
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L21:
    subq    $1, %rax
    movzbl  -1(%rsp), %edx
    jne .L21

# x = 99
    movb    $99, -2(%rsp)
    movzbl  -2(%rsp), %eax

# g()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L22:
    subq    $1, %rax
    jne .L22

# h()
    movl    $4000, %eax
    .p2align 4,,10
    .p2align 3
.L23:
    subq    $1, %rax
    movzbl  -2(%rsp), %edx
    jne .L23

# return 0;
    xorl    %eax, %eax
    ret
    .cfi_endproc

Analysis

All 3 functions are inlined, and both that allocate volatile local variables do so on the stack for fairly obvious reasons. But that's about the only thing they share...

f() ensures to read from x on each iteration, presumably due to its volatile - but just dumps the result to edx, presumably because the destination y isn't declared volatile and is never read, meaning changes to it can be nixed under the as-if rule. OK, makes sense.
- Well, I mean... kinda. Like, not really, because volatile is really for hardware registers, and clearly a local value can't be one of those - and can't otherwise be modified in a volatile way unless its address is passed out... which it's not. Look, there's just not a lot of sense to be had out of volatile local values. But C++ lets us declare them and tries to do something with them. And so, confused as always, we stumble onwards.
g(): What. By moving the volatile source into a pass-by-value parameter, which is still just another local variable, GCC somehow decides it's not or less volatile, and so it doesn't need to read it every iteration... but it still carries out the loop, despite its body now doing nothing.
h(): By taking the passed volatile as pass-by-reference, the same effective behaviour as f() is restored, so the loop does volatile reads.
- This case alone actually makes practical sense to me, for reasons outlined above against f(). To elaborate: Imagine x refers to a hardware register, of which every read has side-effects. You wouldn't want to skip any of those.

Adding #define volatile /**/ leads to main() being a no-op, as you'd expect. So, when present, even on a local variable volatile does do something... I just have no idea what in the case of g(). What on Earth is going on there?

Questions

Why does a local value declared in-body produce different results from a by-value parameter, with the former letting reads be optimised away? Both are declared volatile. Neither have an address passed out - and don't have a static address, ruling out any inline-ASM POKEry - so they can never be modified outwith the function. The compiler can see that each is constant, need never be re-read, and volatile just ain't true -
- so (A) is either allowed to be elided under such constraints? (acting as-if they weren't declared volatile) -
- and (B) why does only one get elided? Are some volatile local variables more volatile than others?
Setting aside that inconsistency for just a moment: After the read was optimised away, why does the compiler still generate the loop? It does nothing! Why doesn't the optimiser elide it as-if no loop was coded?

Is this a weird corner case due to order of optimising analyses or such? As the code is a daft thought-experiment, I wouldn't chastise GCC for this, but it'd be good to know for sure. (Or is g() the manual timing loop people have dreamt of all these years?) If we conclude there's no Standard bearing on any of this, I'll move it to their Bugzilla just for their information.

And of course, the more important question from a practical perspective, though I don't want that to overshadow the potential for compiler geekery... Which, if any of these, are well-defined/correct according to the Standard?

TL;DR - If it doesn't change the observable behavior of the program does it really matter? — Captain Obvlious
– Captain Obvlious, Commented Jul 6, 2016 at 23:00
@CaptainObvlious modifications to volatile variables (even automatic ones) are considered observable behaviour — M.M
– M.M, Commented Jul 7, 2016 at 0:57
@DavidSchwartz The standard says that the system must perform a read of a memory location corresponding to x, once for each loop iteration. It would be non-conforming if the system (be it the compiler, or the CPU or whatever) combined all of those to a single read. — M.M
– M.M, Commented Jul 7, 2016 at 1:32

avdgrinten · Accepted Answer · 2016-07-07 14:59:48Z

2

For f: GCC eliminates the non-volatile stores (but not the loads, which can have side-effects if the source location is a memory mapped hardware register). There is really nothing surprising here.

For g: Because of the x86_64 ABI the parameter x of g is allocated in a register (i.e. rdx) and does not have a location in memory. Reading a general purpose register does not have any observable side effects so the dead read gets eliminted.

answered Jul 7, 2016 at 14:59

avdgrinten

2011 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

underscore_d Over a year ago

That sounds similar to what Richard Biener replied on my ticket - gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 - but his reply and edits of the ticket, chiefly the tag wrong-code, indicate that he doesn't think this is OK. Do you? The allocation of g(x) in a register seems like a detail of the ABI - in which case, thanks for the mechanistic explanation - but not permission to break volatile. It looks like the compiler should alter its behaviour to behave properly in this case.

avdgrinten Over a year ago

Well, what behavior do you expect? The compiler cannot issue a memory-read because there is no memory location to read from. The read operation of the abstract machine really IS a no-op here. It could copy the value to memory and read from that location but that would only make sense if x could actually escape g.

avdgrinten Over a year ago

The compiler cannot allocate x in memory because the ABI does not special case volatile arguments (because they do not make much sense) and just passes them in registers just like non-volatile arguments. Allocating x in memory would break the ABI's function calling sequence. Note that the compiler would be allowed to copy x to a different location (but why should reads/write to this location remain ordered/unaltered?) but it certainly cannot accept the x argument in memory without breaking the ABI.

supercat Over a year ago

@avdgrinten: Why can't a compiler allocate x in memory? The caller isn't going to put the value into memory, but all that means is that the function prologue code will have to do so. If x is volatile and the function contains a setjmp, I would think a compiler would likely have to keep it in memory and treat it as volatile whether or not its address is taken unless the compiler knows that no setjmp will occur unexpectedly.

avdgrinten Over a year ago

I agree that the compiler should certainly copy the value into memory (and not elide accesses to this memory location) if the function contains a setjmp or lets a pointer to the local volatile variable escape the function. In other cases my reading of the standard is that it is okay to elide volatile read/writes, even if the variable was not stored in a register: The compiler behaves as-if the volatile read/write actually took place, because it can prove that no one (and not even memory mapped hardware, signal handlers or other async events) can actually observe the access.

|

Collectives™ on Stack Overflow

Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?

Background

Intro

Input

Output

Analysis

Questions

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Background

Intro

Input

Output

Analysis

Questions

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related