11

Consider the following type:

struct S
{
    char v;
};

Given an array of const S, is it possible to, in a standard conformant way, reinterpret it as an array of const char whose elements correspond to the value of the member v for each of the original array's elements, and vice-versa? For example:

const S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
const char* a2 = reinterpret_cast< const char* >(a1);

for (int i = 0; i < 4; ++i)
    std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';

Is the code above portable and would it print true true true true? If not, is there any other way of achieving this?

Obviously, it is possible to create a new array and initialize it with the member v of each element of the original array, but the whole idea is to avoid creating a new array.

2
  • 1
    The question comes down to whether a struct containing a char is required to have no special alignment. Commented Jul 30, 2016 at 0:07
  • @SamVarshavchik: If it has alignment > 1, then it must have padding, because if it's size 1 then in an array the second item would be unaligned. So padding is the real question. The alignment doesn't matter after the question of padding has been resolved. Commented Jul 30, 2016 at 0:16

4 Answers 4

8

Trivially, no - the struct may have padding. And that flat out breaks any reinterpretation as an array.

Sign up to request clarification or add additional context in comments.

3 Comments

What if the padding is explicitly removed therefrom?
@набиячлэвэли : That would require a compiler-specific extension; then no general rule applies.
@набиячлэвэли You'd still be breaking aliasing rules. Lots of real code that broke aliasing rules and worked fine for many years started breaking when compilers got smarter optimizations. It's a terrible idea to assume that it will work.
6

Formally the struct may have padding so that its size is greater than 1.

I.e., formally you can't reinterpret_cast and have fully portable code, except for ¹an array of only one item.

But for the in-practice, some years ago someone asked if there was now any compiler that by default would give sizeof(T) > 1 for struct T{ char x; };. I have yet to see any example. So in practice one can just static_assert that the size is 1, and not worry at all that this static_assert will fail on some system.

I.e.,

S const a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };
static_assert( sizeof( S ) == 1, "!" );

char const* const a2 = reinterpret_cast<char const*>( a1 );

for( int i = 0; i < 4; ++i )
{
    assert( a1[i].v == a2[i] );
}

Since it's possible to interpret the C++14 and later standards in a way where the indexing has Undefined Behavior, based on a peculiar interpretation of "array" as referring to some original array, one might instead write this code in a more awkward and verbose but guaranteed valid way:

// I do not recommend this, but it's one way to avoid problems with some compiler that's
// based on an unreasonable, impractical interpretation of the C++14 standard.
#include <assert.h>
#include <new>

auto main() -> int
{
    struct S
    {
        char v;
    };

    int const compiler_specific_overhead    = 0;    // Redefine per compiler.
    // With value 0 for the overhead the internal workings here, what happens
    // in the machine code, is the same as /without/ this verbose work-around
    // for one impractical interpretation of the standard.
    int const n = 4;
    static_assert( sizeof( S ) == 1, "!" );
    char storage[n + compiler_specific_overhead]; 
    S* const a1 = ::new( storage ) S[n];
    assert( (void*)a1 == storage + compiler_specific_overhead );

    for( int i = 0; i < n; ++i ) { a1[i].v = "a42"[i]; }    //  Whatever

    // Here a2 points to items of the original `char` array, hence no indexing
    // UB even with impractical interpretation of the C++14 standard.
    // Note that the indexing-UB-free code from this point, is exactly the same
    // source code as the first code example that some claim has indexing UB.
    char const* const a2 = reinterpret_cast<char const*>( a1 );

    for( int i = 0; i < n; ++i )
    {
        assert( a1[i].v == a2[i] );
    }
}

Notes:
¹ The standard guarantees that there's no padding at the start of the struct.

13 Comments

I think the implied pointer arithmetic in a2[i] induces undefined behavior.
@T.C. I don't know about the indexing, but as mentioned there is already formal UB. One can't make it more UB, just like one can't be just a little pregnant. :) However, in practice the only worry about that is that a certain compiler conceivably could use it to sabotage-“optimize” the code…
"not portable" does not imply UB. @T.C. is right that the UB comes from the pointer arithmetic not the use of reinterpret_cast, see [expr.add] p6.
This seems like a recipe to produce code that subtly fails due to optimizations based on aliasing assumptions. So a static_assert is not sufficient. You have to verify that the compiler, every compiler, and every change of options or versions, doesn't break the code.
@DavidSchwartz: You're right, but your statement misleads because (1) compilers ability to compile ordinary code is verified by testing the code with the compilers that one uses, which is done anyway, i.e. there's nothing extra to do, and (2) if a compiler, say g++, fails to compile this reasonably, then that's a good reason to ditch that compiler. Compilers are our tools, not our masters. Use tools that give productivity, ditch the ones that make the job harder.
|
4

The pointer arithmetic in a2[i] is undefined, see C++14 5.7 [expr.add] p7:

For addition or subtraction, if the expressions P or Q have type "pointer to cv T", where T and the array element type are not similar (4.5), the behavior is undefined. [ Note: In particular, a pointer to a base class cannot be used for pointer arithmetic when the array contains objects of a derived class type. — end note ]

Because of this rule, even if there is no padding and the sizes match, type-based alias analysis allows the compiler to assume that a1[i] and a2[i] do not overlap (because the pointer arithmetic is only valid if a2 really is an array of char not just something with the same size and alignment, and if it's really an array of char it must be a separate object from an array of S).

15 Comments

This quote is out of context, and it's apparently nowhere in C++11.
I found it in C++14. It's can be interpreted in at least two ways, one reasonable with a purpose, and (yours) one unreasonable sans purpose. Worth noting that C++14 is the first version of the standard that incorporates bits of very low quality: this unclear wording is one example.
The wording comes from DR 1504 which has DR status so addresses a defect in C++11
Re "type-based alias analysis allows the compiler to assume that a1[i] and a2[i] do not overlap", no, that's only so with your unresonable interpretation of "array" as original array, instead of the array at hand. However, this interpretation can (possibly) have been adopted in a perverse compiler.
A placement new-expression begins the lifetime of an object. A reinterpret_cast doesn't.
|
2

I think I'd be inclined to use a compile-time transformation if the source data is constant:

#include <iostream>
#include <array>

struct S
{
    char v;
};

namespace detail {
    template<std::size_t...Is>
    constexpr auto to_cstring(const S* p, std::index_sequence<Is...>)
    {
        return std::array<char, sizeof...(Is)> {
            p[Is].v...
        };
    }
}

template<std::size_t N>
constexpr auto to_cstring(const S (&arr)[N])
{
    return detail::to_cstring(arr, std::make_index_sequence<N>());
}

int main()
{
    const /*expr if you wish*/ S a1[] = { {'a'}, {'4'}, {'2'}, {'\0'} };

    const /*expr if you wish*/ auto a2 = to_cstring(a1);


    for (int i = 0; i < 4; ++i)
        std::cout << std::boolalpha << (a1[i].v == a2[i]) << ' ';
}

output:

true true true true

even when the data is not a constexpr, gcc and clang are pretty good at constant folding complex sequences like this.

2 Comments

Hm, copying. What about arrays of some million items or more?
@Cheersandhth.-Alf as ever, it depends... How many times are we doing it? Can the compiler elide the copy? Is it a candidate for constant folding? And so on. In a tight loop, with mutable data, probably not. But in many cases, even though a copy is written, after an optimisation pass, it won't actually happen. Compilers today are pretty good.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.