11

If I have a simple tensor class like this

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;
}

Is it undefined behavior to use pointer-arithmetic (see below) to access its elements?

 double& Tensor::operator[](int i) 
{ 
    assert(i < 9); 
    return (&XX)[i]; 
}
4
  • 2
    Yes. *(&XX+0) is valid, &XX+1 is valid; *(&XX+1) is UB Commented Oct 17, 2019 at 8:56
  • 1
    @RichardCritten And &xx+2 is also UB. Commented Oct 17, 2019 at 8:58
  • @RichardCritten though if I understand Mirko correctly in his talk, &XX+1 points to the end of Tensor, right? Commented Oct 17, 2019 at 9:02
  • @GeckoGeorge no, it points to the address following the XX member Commented Oct 17, 2019 at 9:04

4 Answers 4

13

Yes, it is undefined behavior.

The data members are not in an array, and thus are NOT guaranteed to be stored back-to-back in contiguous memory, as pointer arithmetic would require. There may be indeterminate padding generated between them.

The correct way would be to access the members individually, eg:

double& Tensor::operator[](int i)
{
    switch (i)
    {
        case 0: return XX;
        case 1: return XY;
        case 2: return XZ;
        case 3: return YX;
        case 4: return YY;
        case 5: return YZ;
        case 6: return ZX;
        case 7: return ZY;
        case 8: return ZZ;
        default: throw std::out_of_range("invalid index");
    }
}

Alternatively, if you really want to use array syntax:

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    double* arr[] = {
        &XX, &XY, &XZ,
        &YX, &YY, &YZ, 
        &ZX, &ZY, &ZZ
    };

    return *(arr[i]);
}

Or

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    static double Tensor::* arr[] = {
        &Tensor::XX, &Tensor::XY, &Tensor::XZ,
        &Tensor::YX, &Tensor::YY, &Tensor::YZ, 
        &Tensor::ZX, &Tensor::ZY, &Tensor::ZZ
    };

    return this->*(arr[i]);
}

Otherwise, use an actual array for the data, and define methods to access the elements:

struct Tensor
{
    double data[9];

    double& XX() { return data[0]; }
    double& XY() { return data[1]; }
    double& XZ() { return data[2]; }
    double& YX() { return data[3]; }
    double& YY() { return data[4]; }
    double& YZ() { return data[5]; }
    double& ZX() { return data[6]; }
    double& ZY() { return data[7]; }
    double& ZZ() { return data[8]; }

    double& operator[](int i)
    {
        if ((i < 0) || (i > 8))
            throw std::out_of_range("invalid index");
        return data[i];
    }
};
Sign up to request clarification or add additional context in comments.

10 Comments

can there be padding inserted if all these variables are of the same type? There could be padding before XX, but between XX and XY, etc. ?
@Touloudou not before XX, but between the other members, yes, if the alignment of Tenser as a whole is set larger than the alignment of the individual members
I see. That makes sense, and definitely makes this code UB. Follow up question: is there a non-UB way to map indices to these variables then?
@Touloudou Do the opposite. Create an array member variable and getter member functions named XX(), XY(),... that return references to particular array elements. (Of course, there is a way how to map indices to member varialbes, but that would involve ifs or switch with a runtime performance penalty.)
@DanielLangr Just saw your newly updated answer. Probably gonna go with that. There shouldn't be any performance penalty and UB is gone.
|
8

There's a cppcon talk that mentions this!

So yes, it's undefined behaviour, because classes and arrays don't share a common initial sequence.

Edit: Miro Knejp introduces that slide at around 3:44 if you want more context for all the non-c++ on the slide, but the question and answer is really the only part of the talk that goes into your question.

2 Comments

Oh, god.. he says "I dunno why"... it's of course because method of storing and addressing array and single variable depends on platform's ISA. Likely you won't have problems on x86, but on SPARC? on MIPS64? on something else?
I must have a short memory. I just watched that last week. Yet here I am looking for the answer. Did I mention I have a short memory?
3

It is undefined behavior.

In general, pointer arithmetic is properly defined only for the members of an array (and maybe one element after, as described in section §8.5.6 of the standard).

For classes/structures, this can't work, because the compiler can add padding or other data between the members. cppreference has a brief description of the class layout.

Now, moving to solutions to your problem, the first one would be to simply use something made for this, such as Eigen. It is a mature library for linear algebra, with well tested code and good optimizations.

If you are not interested in adding a new library, you would have to more or less implement manually either member access or operator[].

Comments

3

Just another possible solution: wrap references in a class and have an array of wrappers.

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;

    class DoubleRefenceWrapper
    {
        double & ref;
    public:
        DoubleRefenceWrapper(double & r) : ref(r) {}
        double & get() { return ref; }
    } elements[9];

    Tensor() : elements{XX, XY, XZ, YX, YY, YZ, ZX, ZY, ZZ} {}

    double& operator[](unsigned int i)
    {
        if(i < 9)
        {
            return elements[i].get();
        }
        throw std::out_of_range("Tensor index must be in [0-8] range");
    }
};

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.