C++11: reinterpreting array of structs as array of struct's member

Question

Consider the following type:

struct S
{
    char v;
};

Given an array of const S, is it possible to, in a standard conformant way, reinterpret it as an array of const char whose elements correspond to the value of the member v for each of the original array's elements, and vice-versa? For example:

const S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
const char* a2 = reinterpret_cast< const char* >(a1);

for (int i = 0; i < 4; ++i)
    std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';

Is the code above portable and would it print true true true true? If not, is there any other way of achieving this?

Obviously, it is possible to create a new array and initialize it with the member v of each element of the original array, but the whole idea is to avoid creating a new array.

The question comes down to whether a struct containing a char is required to have no special alignment. — Sam Varshavchik
– Sam Varshavchik, Commented Jul 30, 2016 at 0:07
@SamVarshavchik: If it has alignment > 1, then it must have padding, because if it's size 1 then in an array the second item would be unaligned. So padding is the real question. The alignment doesn't matter after the question of padding has been resolved. — Cheers and hth. - Alf
– Cheers and hth. - Alf, Commented Jul 30, 2016 at 0:16

MSalters · Accepted Answer · 2016-07-30 00:08:52Z

8

Trivially, no - the struct may have padding. And that flat out breaks any reinterpretation as an array.

answered Jul 30, 2016 at 0:08

MSalters

182k11 gold badges171 silver badges376 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

набиячлэвэли Over a year ago

What if the padding is explicitly removed therefrom?

MSalters Over a year ago

@набиячлэвэли : That would require a compiler-specific extension; then no general rule applies.

David Schwartz Over a year ago

@набиячлэвэли You'd still be breaking aliasing rules. Lots of real code that broke aliasing rules and worked fine for many years started breaking when compilers got smarter optimizations. It's a terrible idea to assume that it will work.

Cheers and hth. - Alf · Accepted Answer · 2016-07-30 03:41:39Z

6

Formally the struct may have padding so that its size is greater than 1.

I.e., formally you can't reinterpret_cast and have fully portable code, except for ¹an array of only one item.

But for the in-practice, some years ago someone asked if there was now any compiler that by default would give sizeof(T) > 1 for struct T{ char x; };. I have yet to see any example. So in practice one can just static_assert that the size is 1, and not worry at all that this static_assert will fail on some system.

I.e.,

S const a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
static_assert( sizeof( S ) == 1, "!" );

char const* const a2 = reinterpret_cast<char const*>( a1 );

for( int i = 0; i < 4; ++i )
{
    assert( a1[i].v == a2[i] );
}

Since it's possible to interpret the C++14 and later standards in a way where the indexing has Undefined Behavior, based on a peculiar interpretation of "array" as referring to some original array, one might instead write this code in a more awkward and verbose but guaranteed valid way:

// I do not recommend this, but it's one way to avoid problems with some compiler that's
// based on an unreasonable, impractical interpretation of the C++14 standard.
#include <assert.h>
#include <new>

auto main() -> int
{
    struct S
    {
        char v;
    };

    int const compiler_specific_overhead    = 0;    // Redefine per compiler.
    // With value 0 for the overhead the internal workings here, what happens
    // in the machine code, is the same as /without/ this verbose work-around
    // for one impractical interpretation of the standard.
    int const n = 4;
    static_assert( sizeof( S ) == 1, "!" );
    char storage[n + compiler_specific_overhead]; 
    S* const a1 = ::new( storage ) S[n];
    assert( (void*)a1 == storage + compiler_specific_overhead );

    for( int i = 0; i < n; ++i ) { a1[i].v = "a42"[i]; }    //  Whatever

    // Here a2 points to items of the original `char` array, hence no indexing
    // UB even with impractical interpretation of the C++14 standard.
    // Note that the indexing-UB-free code from this point, is exactly the same
    // source code as the first code example that some claim has indexing UB.
    char const* const a2 = reinterpret_cast<char const*>( a1 );

    for( int i = 0; i < n; ++i )
    {
        assert( a1[i].v == a2[i] );
    }
}

^{Notes:

¹ The standard guarantees that there's no padding at the start of the struct.}

edited Jul 30, 2016 at 3:41

answered Jul 30, 2016 at 0:12

Cheers and hth. - Alf

146k15 gold badges218 silver badges342 bronze badges

13 Comments

T.C. Over a year ago

I think the implied pointer arithmetic in a2[i] induces undefined behavior.

Cheers and hth. - Alf Over a year ago

@T.C. I don't know about the indexing, but as mentioned there is already formal UB. One can't make it more UB, just like one can't be just a little pregnant. :) However, in practice the only worry about that is that a certain compiler conceivably could use it to sabotage-“optimize” the code…

Jonathan Wakely Over a year ago

"not portable" does not imply UB. @T.C. is right that the UB comes from the pointer arithmetic not the use of reinterpret_cast, see [expr.add] p6.

David Schwartz Over a year ago

This seems like a recipe to produce code that subtly fails due to optimizations based on aliasing assumptions. So a static_assert is not sufficient. You have to verify that the compiler, every compiler, and every change of options or versions, doesn't break the code.

Cheers and hth. - Alf Over a year ago

@DavidSchwartz: You're right, but your statement misleads because (1) compilers ability to compile ordinary code is verified by testing the code with the compilers that one uses, which is done anyway, i.e. there's nothing extra to do, and (2) if a compiler, say g++, fails to compile this reasonably, then that's a good reason to ditch that compiler. Compilers are our tools, not our masters. Use tools that give productivity, ditch the ones that make the job harder.

|

Jonathan Wakely · Accepted Answer · 2016-07-30 02:04:24Z

4

The pointer arithmetic in a2[i] is undefined, see C++14 5.7 [expr.add] p7:

For addition or subtraction, if the expressions P or Q have type "pointer to cv T", where T and the array element type are not similar (4.5), the behavior is undefined. [ Note: In particular, a pointer to a base class cannot be used for pointer arithmetic when the array contains objects of a derived class type. — end note ]

Because of this rule, even if there is no padding and the sizes match, type-based alias analysis allows the compiler to assume that a1[i] and a2[i] do not overlap (because the pointer arithmetic is only valid if a2 really is an array of char not just something with the same size and alignment, and if it's really an array of char it must be a separate object from an array of S).

edited Jul 30, 2016 at 2:04

answered Jul 30, 2016 at 1:55

Jonathan Wakely

172k28 gold badges360 silver badges540 bronze badges

15 Comments

Cheers and hth. - Alf Over a year ago

This quote is out of context, and it's apparently nowhere in C++11.

Cheers and hth. - Alf Over a year ago

I found it in C++14. It's can be interpreted in at least two ways, one reasonable with a purpose, and (yours) one unreasonable sans purpose. Worth noting that C++14 is the first version of the standard that incorporates bits of very low quality: this unclear wording is one example.

Jonathan Wakely Over a year ago

The wording comes from DR 1504 which has DR status so addresses a defect in C++11

Cheers and hth. - Alf Over a year ago

Re "type-based alias analysis allows the compiler to assume that a1[i] and a2[i] do not overlap", no, that's only so with your unresonable interpretation of "array" as original array, instead of the array at hand. However, this interpretation can (possibly) have been adopted in a perverse compiler.

Jonathan Wakely Over a year ago

A placement new-expression begins the lifetime of an object. A reinterpret_cast doesn't.

|

Richard Hodges · Accepted Answer · 2016-07-30 01:39:10Z

2

I think I'd be inclined to use a compile-time transformation if the source data is constant:

#include <iostream>
#include <array>

struct S
{
    char v;
};

namespace detail {
    template<std::size_t...Is>
    constexpr auto to_cstring(const S* p, std::index_sequence<Is...>)
    {
        return std::array<char, sizeof...(Is)> {
            p[Is].v...
        };
    }
}

template<std::size_t N>
constexpr auto to_cstring(const S (&arr)[N])
{
    return detail::to_cstring(arr, std::make_index_sequence<N>());
}

int main()
{
    const /*expr if you wish*/ S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };

    const /*expr if you wish*/ auto a2 = to_cstring(a1);


    for (int i = 0; i < 4; ++i)
        std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';
}

output:

true true true true

even when the data is not a constexpr, gcc and clang are pretty good at constant folding complex sequences like this.

answered Jul 30, 2016 at 1:39

Richard Hodges

70.3k8 gold badges103 silver badges157 bronze badges

2 Comments

Cheers and hth. - Alf Over a year ago

Hm, copying. What about arrays of some million items or more?

Richard Hodges Over a year ago

@Cheersandhth.-Alf as ever, it depends... How many times are we doing it? Can the compiler elide the copy? Is it a candidate for constant folding? And so on. In a tight loop, with mutable data, probably not. But in many cases, even though a copy is written, after an optimisation pass, it won't actually happen. Compilers today are pretty good.

Collectives™ on Stack Overflow

C++11: reinterpreting array of structs as array of struct's member

4 Answers 4

3 Comments

13 Comments

15 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

13 Comments

15 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related