7

I have an array of structs, and I have a pointer to a member of one of those structs. I would like to know which element of the array contains the member. Here are two approaches:

#include <array>
#include <string>

struct xyz
{
    float x, y;
    std::string name;
};

typedef std::array<xyz, 3> triangle;

// return which vertex the given coordinate is part of
int vertex_a(const triangle& tri, const float* coord)
{
    return reinterpret_cast<const xyz*>(coord) - tri.data();
}

int vertex_b(const triangle& tri, const float* coord)
{
    std::ptrdiff_t offset = reinterpret_cast<const char*>(coord) - reinterpret_cast<const char*>(tri.data());
    return offset / sizeof(xyz);
}

Here's a test driver:

#include <iostream>

int main()
{
    triangle tri{{{12.3, 45.6}, {7.89, 0.12}, {34.5, 6.78}}};
    for (const xyz& coord : tri) {
        std::cout
            << vertex_a(tri, &coord.x) << ' '
            << vertex_b(tri, &coord.x) << ' '
            << vertex_a(tri, &coord.y) << ' '
            << vertex_b(tri, &coord.y) << '\n';
    }
}

Both approaches produce the expected results:

0 0 0 0
1 1 1 1
2 2 2 2

But are they valid code?

In particular I wonder if vertex_a() might be invoking undefined behavior by casting float* y to xyz* since the result does not actually point to a struct xyz. That concern led me to write vertex_b(), which I think is safe (is it?).

Here's the code generated by GCC 6.3 with -O3:

vertex_a(std::array<xyz, 3ul> const&, float const*):
    movq    %rsi, %rax
    movabsq $-3689348814741910323, %rsi ; 0xCCC...CD
    subq    %rdi, %rax
    sarq    $3, %rax
    imulq   %rsi, %rax

vertex_b(std::array<xyz, 3ul> const&, float const*):
    subq    %rdi, %rsi
    movabsq $-3689348814741910323, %rdx ; 0xCCC...CD
    movq    %rsi, %rax
    mulq    %rdx
    movq    %rdx, %rax
    shrq    $5, %rax
3
  • 1
    That's breaking the strict aliasing rule quite badly. Commented Jun 7, 2017 at 9:53
  • @Someprogrammerdude: Can you clarify? I think vertex_b() does not break strict-aliasing. And as for vertex_a() I wasn't sure, because the pointer is never dereferenced. Commented Jun 7, 2017 at 10:01
  • 2
    @Someprogrammerdude no it isn't. Commented Jun 29, 2017 at 2:34

4 Answers 4

8

Neither is valid per the standard.


In vertex_a, you're allowed to convert a pointer to xyz::x to a pointer to xyz because they're pointer-interconvertible:

Two objects a and b are pointer-interconvertible if [...] one is a standard-layout class object and the other is the first non-static data member of that object [...]

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast.

But you can't do the cast from a pointer to xyz::y to a pointer to xyz. That operation is undefined.


In vertex_b, you're subtracting two pointers to const char. That operation is defined in [expr.add] as:

If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined

Your expressions don't point to elements of an array of char, so the behavior is undefined.

Sign up to request clarification or add additional context in comments.

18 Comments

Regarding vertex_b(), see the standard here: stackoverflow.com/a/37119041/4323 - it says "If a program attempts to access the stored value of an object through [...] other than one of the following types the behavior is undefined [...] - a char or unsigned char type." I am certain this means that reading a byte from any object after reinterpret_casting to char* is valid. So given that the cast is valid, and reading chars from the resulting char* is valid, I think this should satisfy your [expr.add] requirement. What do you think?
@John None of that turns what you're pointing to into an array of char. Since there is no array that these pointers index into, the subtraction isn't defined.
OK, so what you're saying is that arithmetic on two char pointers is never OK if the objects they point to were not originally typed as char. Is that right? And you're saying this produces UB? That would be pretty surprising given how common this sort of thing is in networking code (which of course is the same sort of code that takes advantage of the right to cast anything to char* in the first place).
Also, do you have any alternative implementation that you think is completely legal?
@Barry: Both C and C++ existed and were in wide use before the standards were written. The ability to treat objects in C, and PODS in C++, as sequences of character-type values has always been fundamental to both languages. Since the authors of the C Standard explicitly recognize that it does not mandate everything necessary to make an implementation be useful for any purpose, and the C++ Standard relies upon key aspects of the C Standard, anyone wanting to produce a useful implementation must support such behaviors whether or not the exact wording of the Standard would mandate support.
|
4

vertex_a indeed breaks the strict aliasing rule (none of your floats are valid xyzs, and in 50% of your example they're not even at the start of an xyz even if there's no padding).

vertex_b relies on, shall we say, creative interpretation of the standard. Though your cast to const char* is sound, performing arithmetic with it around the rest of the array is a little more dodgy. Historically I've concluded that this kind of thing has undefined behaviour, because "the object" in this context is the xyz, not the array. However, I'm leaning towards others' interpretation nowadays that this will always work, and wouldn't expect anything else in practice.

2 Comments

and I've utterly failed to find the references I was looking for, but I'll keep trying
I thought strict aliasing only applies if the pointer is dereferenced (and in my case it is not). Was I wrong about that?
3
+50

vertex_b is completely fine. You only maybe need to refine return offset / sizeof(xyz); since you're dividing std::ptrdiff_t with std::size_t and implicitly casting the result into int. By book, this behavior is implementation defined. std::ptrdiff_t is signed and std::size_t unsigned and result of division might be larger than INT_MAX (very unlikely) with huge array size on some platforms/compilers.

To cast away your worries, you can put assert()s and/or #errors which check PTRDIFF_MIN, PTRDIFF_MAX, SIZE_MAX, INT_MIN and INT_MAX, but I personally would not bother so much.

3 Comments

What about vertex_a()? It was suggested that it breaks strict aliasing, but I don't see how because it does not dereference the pointer.
vertex_a is wrong if coord can point to member y of xyz. The main idea behind pointer arithmetic, since it appeared (in C), is to point to elements, not arbitrary memory location (these are equal for one byte element size). And coord may not point to start of xyz, you're even allowing a check for any value of coord.
Memory alignment in this case is broken if multiple of required alignment is not equal to float size. While many CPUs will allow you to store this address into CPU register and will not produce error until you try to read or write to it, there's no such guarantee. Some CPUs (microcontrollers) might even not have first N lower bits of address register at all due to design simplification and might not even have instruction to insert such an address into register because that instruction will probably not have the first N lower bits also.
1

Perhaps a more robust approach would involve changing the type signature to xyz::T* (T is a template argument so you can take xyz::x or xyz::y as needed) instead of float*

Then you can use offsetof(struct xyz,T) to confidently compute the location of the start of the struct in a way that should be more resilient to future changes in its definition.

Then the rest follows as you are currently doing: once you have a pointer to the start of the struct finding its offset in the array is a valid pointer subtraction.

There is some pointer nastiness involved. But this is an approach that is used. e.g. see the container_of() macro in the linux kernel. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/067/6717/6717s2.html

1 Comment

In some cases I only have a pointer to the member, and don't know its name. I only know the address.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.