Accessing struct data members via pointer arithmetic

Question

If I have a simple tensor class like this

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;
}

Is it undefined behavior to use pointer-arithmetic (see below) to access its elements?

 double& Tensor::operator[](int i) 
{ 
    assert(i < 9); 
    return (&XX)[i]; 
}

Yes. *(&XX+0) is valid, &XX+1 is valid; *(&XX+1) is UB — Richard Critten
– Richard Critten, Commented Oct 17, 2019 at 8:56
@RichardCritten though if I understand Mirko correctly in his talk, &XX+1 points to the end of Tensor, right? — GeckoGeorge
– GeckoGeorge, Commented Oct 17, 2019 at 9:02
@GeckoGeorge no, it points to the address following the XX member — Remy Lebeau
– Remy Lebeau, Commented Oct 17, 2019 at 9:04

Remy Lebeau · Accepted Answer · 2019-10-17 09:48:33Z

13

Yes, it is undefined behavior.

The data members are not in an array, and thus are NOT guaranteed to be stored back-to-back in contiguous memory, as pointer arithmetic would require. There may be indeterminate padding generated between them.

The correct way would be to access the members individually, eg:

double& Tensor::operator[](int i)
{
    switch (i)
    {
        case 0: return XX;
        case 1: return XY;
        case 2: return XZ;
        case 3: return YX;
        case 4: return YY;
        case 5: return YZ;
        case 6: return ZX;
        case 7: return ZY;
        case 8: return ZZ;
        default: throw std::out_of_range("invalid index");
    }
}

Alternatively, if you really want to use array syntax:

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    double* arr[] = {
        &XX, &XY, &XZ,
        &YX, &YY, &YZ, 
        &ZX, &ZY, &ZZ
    };

    return *(arr[i]);
}

Or

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    static double Tensor::* arr[] = {
        &Tensor::XX, &Tensor::XY, &Tensor::XZ,
        &Tensor::YX, &Tensor::YY, &Tensor::YZ, 
        &Tensor::ZX, &Tensor::ZY, &Tensor::ZZ
    };

    return this->*(arr[i]);
}

Otherwise, use an actual array for the data, and define methods to access the elements:

struct Tensor
{
    double data[9];

    double& XX() { return data[0]; }
    double& XY() { return data[1]; }
    double& XZ() { return data[2]; }
    double& YX() { return data[3]; }
    double& YY() { return data[4]; }
    double& YZ() { return data[5]; }
    double& ZX() { return data[6]; }
    double& ZY() { return data[7]; }
    double& ZZ() { return data[8]; }

    double& operator[](int i)
    {
        if ((i < 0) || (i > 8))
            throw std::out_of_range("invalid index");
        return data[i];
    }
};

edited Oct 17, 2019 at 9:48

answered Oct 17, 2019 at 8:59

Remy Lebeau

609k36 gold badges516 silver badges875 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Touloudou Over a year ago

can there be padding inserted if all these variables are of the same type? There could be padding before XX, but between XX and XY, etc. ?

Remy Lebeau Over a year ago

@Touloudou not before XX, but between the other members, yes, if the alignment of Tenser as a whole is set larger than the alignment of the individual members

Touloudou Over a year ago

I see. That makes sense, and definitely makes this code UB. Follow up question: is there a non-UB way to map indices to these variables then?

Daniel Langr Over a year ago

@Touloudou Do the opposite. Create an array member variable and getter member functions named XX(), XY(),... that return references to particular array elements. (Of course, there is a way how to map indices to member varialbes, but that would involve ifs or switch with a runtime performance penalty.)

Touloudou Over a year ago

@DanielLangr Just saw your newly updated answer. Probably gonna go with that. There shouldn't be any performance penalty and UB is gone.

|

GeckoGeorge · Accepted Answer · 2019-10-17 08:56:43Z

8

There's a cppcon talk that mentions this!

So yes, it's undefined behaviour, because classes and arrays don't share a common initial sequence.

Edit: Miro Knejp introduces that slide at around 3:44 if you want more context for all the non-c++ on the slide, but the question and answer is really the only part of the talk that goes into your question.

answered Oct 17, 2019 at 8:56

GeckoGeorge

4451 gold badge3 silver badges12 bronze badges

2 Comments

Swift - Friday Pie Over a year ago

Oh, god.. he says "I dunno why"... it's of course because method of storing and addressing array and single variable depends on platform's ISA. Likely you won't have problems on x86, but on SPARC? on MIPS64? on something else?

Adrian McCarthy Over a year ago

I must have a short memory. I just watched that last week. Yet here I am looking for the answer. Did I mention I have a short memory?

Paul92 · Accepted Answer · 2019-10-17 09:21:24Z

3

It is undefined behavior.

In general, pointer arithmetic is properly defined only for the members of an array (and maybe one element after, as described in section §8.5.6 of the standard).

For classes/structures, this can't work, because the compiler can add padding or other data between the members. cppreference has a brief description of the class layout.

Now, moving to solutions to your problem, the first one would be to simply use something made for this, such as Eigen. It is a mature library for linear algebra, with well tested code and good optimizations.

If you are not interested in adding a new library, you would have to more or less implement manually either member access or operator[].

answered Oct 17, 2019 at 9:21

Paul92

9,1121 gold badge29 silver badges40 bronze badges

Comments

p-a-o-l-o · Accepted Answer · 2019-10-17 10:19:57Z

3

Just another possible solution: wrap references in a class and have an array of wrappers.

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;

    class DoubleRefenceWrapper
    {
        double & ref;
    public:
        DoubleRefenceWrapper(double & r) : ref(r) {}
        double & get() { return ref; }
    } elements[9];

    Tensor() : elements{XX, XY, XZ, YX, YY, YZ, ZX, ZY, ZZ} {}

    double& operator[](unsigned int i)
    {
        if(i < 9)
        {
            return elements[i].get();
        }
        throw std::out_of_range("Tensor index must be in [0-8] range");
    }
};

answered Oct 17, 2019 at 10:19

p-a-o-l-o

10.1k2 gold badges25 silver badges37 bronze badges

Collectives™ on Stack Overflow

Accessing struct data members via pointer arithmetic

4 Answers 4

10 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related