Why does the optimizer remove my code?

Question

Today I stumbled across a weird problem. Consider this simple program where I try to emulate MMX's PADDW instruction:

#include <cstdint>
#include <cstdio>

int main()
{
    uint64_t a = 0;
    uint64_t b = 0x1234123412341234;

    uint64_t c = 0;
    uint16_t *a_words = reinterpret_cast<uint16_t*>(&a);
    uint16_t *b_words = reinterpret_cast<uint16_t*>(&b);
    uint16_t *c_words = reinterpret_cast<uint16_t*>(&c);

    for (size_t i = 0; i < 4; i ++)
        c_words[i] = a_words[i] + b_words[i];

    printf("%d %d %d %d\n", a_words[0], a_words[1], a_words[2], a_words[3]);
    printf("%d %d %d %d\n", b_words[0], b_words[1], b_words[2], b_words[3]);
    printf("%d %d %d %d\n", c_words[0], c_words[1], c_words[2], c_words[3]);
    printf("%016llx\n", c);
    return 0;
}

Compiling this and running with g++ -std=c++11 test.cpp -o test && ./test results in following:

0 0 0 0
4660 4660 4660 4660
4660 4660 4660 4660
1234123412341234

However, if I enable -O2, it displays wrong value (on -O1 it still works):

0 0 0 0
4660 4660 4660 4660
4660 4660 4660 4660
0000000000000000

Why is that?

Other observations:

If I unroll the loop, compiling with -O2 works (!!):

#include <cstdint>
#include <cstdio>

int main()
{
    uint64_t a = 0;
    uint64_t b = 0x1234123412341234;

    uint64_t c = 0;
    uint16_t *a_words = reinterpret_cast<uint16_t*>(&a);
    uint16_t *b_words = reinterpret_cast<uint16_t*>(&b);
    uint16_t *c_words = reinterpret_cast<uint16_t*>(&c);

    c_words[0] = a_words[0] + b_words[0];
    c_words[1] = a_words[1] + b_words[1];
    c_words[2] = a_words[2] + b_words[2];
    c_words[3] = a_words[3] + b_words[3];

    printf("%d %d %d %d\n", a_words[0], a_words[1], a_words[2], a_words[3]);
    printf("%d %d %d %d\n", b_words[0], b_words[1], b_words[2], b_words[3]);
    printf("%d %d %d %d\n", c_words[0], c_words[1], c_words[2], c_words[3]);
    printf("%016llx\n", c);
    return 0;
}

If I work with very similar problem but for 32-bit integers instead of 64-bit ones, it works as well:

#include <cstdint>
#include <cstdio>

int main()
{
    uint32_t a = 0;
    uint32_t b = 0x12121212;

    uint32_t c = 0;
    uint8_t *a_words = reinterpret_cast<uint8_t*>(&a);
    uint8_t *b_words = reinterpret_cast<uint8_t*>(&b);
    uint8_t *c_words = reinterpret_cast<uint8_t*>(&c);

    for (size_t i = 0; i < 4; i ++)
        c_words[i] = a_words[i] + b_words[i];

    printf("%d %d %d %d\n", a_words[0], a_words[1], a_words[2], a_words[3]);
    printf("%d %d %d %d\n", b_words[0], b_words[1], b_words[2], b_words[3]);
    printf("%d %d %d %d\n", c_words[0], c_words[1], c_words[2], c_words[3]);
    printf("%08x\n", c);
    return 0;
}

The problem recurs on both 32-bit and 64-bit machines. Tried g++ (GCC) 4.9.2 on Cygwin and g++ (Debian 4.9.1-19) 4.9.1 on GNU/Linux.

You violate strict aliasing, which results in undefined behaviour, which your compiler exploits. — milleniumbug
– milleniumbug, Commented Feb 15, 2015 at 13:48
It works with -fno-strict-aliasing, thank you! However, judging from the tone of your comments I feel I'm doing the whole thing totally wrong. Mind giving me a clue how could I tackle the problem in a more elegant way? — rr-
– rr-, Commented Feb 15, 2015 at 13:51
In this case, you could use the good old SWAR addition: ((a & 0x7FFF7FFF7FFF7FFF) + (b & 0x7FFF7FFF7FFF7FFF)) ^ ((a ^ b) & 0x8000800080008000) — user555045
– user555045, Commented Feb 15, 2015 at 13:55

milleniumbug · Accepted Answer · 2015-02-15 14:01:55Z

4

This is strict aliasing violation. You write values of type A to memory which stores object of type B. C++ standard says you can't do that (the exception to this rule are char and its unsigned and signed variant)

This is non-portable code, but yet, if you still want to do it legally, what can you do about it?

copy from uint64_t to uint16_t array (by memcpy or std::copy), modify the values, copy it back.
OR use compiler intrisics which translate directly to vectorized instructions
OR disable strict aliasing.

answered Feb 15, 2015 at 14:01

milleniumbug

15.9k3 gold badges53 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

milleniumbug Over a year ago

@rr- C++ says no, C says yes (as far as I remember)

T.C. Over a year ago

@Yakk stackoverflow.com/questions/11373203/…

Collectives™ on Stack Overflow

Why does the optimizer remove my code?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related