16

As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:

#include <iostream>
#include <type_traits>
int main() {
  int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
  constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
  int *p = &a[0][0];
  int i = p[11];  // <-- here
  p[19] = 20;  // <-- here
  for (int k = 0; k < sz; ++k)
    std::cout << p[k] << ' ';  // <-- and here
  return 0;
}

Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.

7
  • 1
    Well I couldn't find anything directly linked from the ISO C++ Standard. But yes, automatically allocated arrays are guaranteed to be stored contiguously in memory. And when you use the index operator i.e. p[11] on a simple pointer it's equivalent to *(p+11), so if there is legitimate data of type *p the behaviour is defined. Commented Dec 16, 2016 at 17:48
  • 4
    I think [expr.add]/5 might forbid this but [dcl.array]/1 does guarantee that the storage is contiguous. Commented Dec 16, 2016 at 17:55
  • 1
    @Yakk it's self evident from definition of pointer increment and of index operator. Technically, C++ standard guarantees that, just you need be a lawyer to read that properly. Multidimensional array is array of arrays (of arrays, and so on). So incrementing index of array leads to increment equal to size of array's element that is sub-array element times length of sub-array.. and so on recursively Commented Dec 16, 2016 at 17:56
  • 2
    @Swift No, access past the end of any array is not legal. Even if you know what is there. Commented Dec 16, 2016 at 18:17
  • 2
    @Swift there is an array of 4 arrays of 5 ints. There is no variable of type int[20]: an array of 20 ints. Commented Dec 16, 2016 at 18:28

3 Answers 3

12

The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.

So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).

But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:

int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3];      // ok you are in same row which means in same array
b = p[5];          // OUPS: you dereference past the declared array that builds first row

char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q;            // almost the same as p[5] but behaviour is defined

That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.


(*) The standard declares in 1.7 The C++ memory model [intro.memory] that

The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

and later in 3.9 Types [basic.types] §2

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char or unsigned char.

and to copy them you must access them through a char * or unsigned char *

Sign up to request clarification or add additional context in comments.

12 Comments

I think there is technically no aliasing there. We have a pointer to the first element of an array of size 5 in int*p. I suppose you are saying if a pointer to T[a][b] was guaranteed to legal to alias into a pointer to T[a*b] the operation would be implicitly legal?
what limits it to only char pointer arithmetic? in your example (((char *) p) + 5 * sizeof(int)) might be legal on any platform, but if there is mistake in math that would move increemt off address that matches beginning of int element, on platform with strict data alignment, e.g. SPARC, that will cause disastrous situation, it's an illegal opcode. Standard should pursue more streamlined and universally working path, why it cannot be *(p+5) ?
@Yakk: What I say is that a[i] is legal only if a and a+i point inside the same array. And it would be the case if we could alias T[a][b] and T[a*b]. But I agree with you the legality of aliasing the pointers would be enough.
@Swift: I do not want to argue whether the standard should or should not allow this or that. I only say that bytes are a special case that give access to the representation of trivially copyable objects...
It does sound funny that standard declares fundamental storage unit in the C++ memory model is the byte, because it's not true for quite a number of existing platforms where C++ used as programming language. And char isn't byte on all of them. There are platforms where you can't access address that isn't aligned to CPU's word.. or where that word length is depending on which type is used. If you take array of int and move it one byte forward on such system, the data becomes inaccessible as int values.
|
10

I believe the behavior in your example is technically undefined.

The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.

[expr.add]/4 says the following (emphasis mine)

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined

So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.

I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.

17 Comments

But wouldn’t ((p + 4) + 4) + 3 be well defined, given that p + 4 is one past the end of a[0], and also happens to be a pointer to a[1][0]? I’m pretty sure the alignment and padding rules guarantee that sizeof(a[0]) == 4*sizeof(int) and that a[1] is sizeof(a[0]) bytes after a[0].
Standard doesn't define multidimensional arry because it is defined recursively. Think of it like of recursive template. But unlike class, array guaranteed to be stored in continos manner. Compiler will be right to throw warning if in array of int a[4][5] you try access a[4][0] element... because it would be same as *(a+20), which in this particular case is out of array boundary. It will warn about a[0][6]. Such occurrence may suggest mistake\typo in code. If indexes given runtime, compiler will not know what values are used and usual pointer math will be at work.
@DanielH Don't confuse implementation with abstraction. The abstraction states what is required to be defined. Pointers and array indexing is not defined in terms of a linear memory model in C++. Instead, various operations are defined. One way to implement them is with a linear memory model; but even there, optimizers can presume that only defined operations will occur. A classic example is a union of double and integer: modifying one and then reading the other is UB, even though the bits are guaranteed to be overlapping.
@Yakk Irrelevant example. In case of union result depends on which format platform uses, the bit map of values. pointer math is abstract enough. It's not describing how actually array is stored in memory, it describes that fact that we can use pointer to iterate through array elements .After all, on many platforms pointer is a structure, not a scalar value , implementation-wise
@Daniel H by the way, padding not applicable to arrays, only to POD structures and unions. Padding rules in this case is "there is no padding"
|
-1

if you declare

int  arr[3][4][5];

the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.

Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):

*((int*)arr + 20*a +  5*b + c)

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression

6 Comments

@NathanOliver sorry, I fixed it, autocorrect added * after nearly every int with [
The question asks about using an int pointer, though, not decaying arr to a pointer.
@Daniel H arr decays to pointer that points at same object as pointer evaluated by &(first of element of arr)
Right, but the sample code in the question says int *p = &a[0][0] (the equivalent in your answer of int *ptr = &arr[0][0][0]. That is an int pointer.
arr[a][b][c] is *(*(*(arr + a) + b) + c), not what you wrote. arr + 20 is way out of bounds of arr. arr + 3 is the end of arr.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.