5

There are several source code files of a program. File file_a.c has an array in the global scope, and a function is provided that returns a pointer to its beginning:

static int buffer[10];
int *get_buf_addr(void) {
    return buffer;
}

In this file, for example, the array "buffer" is filled with data, and the function get_buf_addr() is called from another translation unit file_b.c to separate the levels of abstraction of the program. Somewhere in file_b.c, get_buf_addr() is called to read data from the received buffer address and send it where it needs to go. Do I understand correctly that after the call:

int *buf = get_buf_addr();

I am formally not allowed to move "forward" or "backward" by the pointer, as if the compiler no know that these addresses belong to the same array? I turned to the standard, paragraph 6.5.6 Additive operators:

  1. ... If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined." And in the same paragraph:

  2. For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."

That is, formally, at the point of calling int *buf = get_buf_addr(); the compiler does not know whether buf points to a single object (not an array) of type int, or to an array of such objects (and if to an array, how long is this array?). I assume that a strictly conforming compiler should treat such a pointer as a pointer to a single int object. So, point 9 quoted above applies, and further arithmetic on such pointers with subsequent access (e.g. UART->FIFO = buf[5];) is undefined behavior.

  1. Is this true? If so, what is the formally correct way to access aggregates (arrays, structures) from separate translation modules so that the program does not contain undefined behavior?
  2. If these are char * pointers, does that change the situation?
2
  • Problems rather appear when you do the opposite: attempt out of bounds pointer arithmetic when the compiler has access to info about the array size and can make assumptions based on that. Commented Sep 3 at 6:44
  • Re: "global scope": FYI: C has no global scope. N3299, 6.2.1p2: "There are four kinds of scopes: function, file, block, and function prototype." Commented Sep 19 at 13:13

5 Answers 5

9

There is no problem performing pointer arithmetic on buf, as long as the result is in the range of buffer + 0 to buffer + 10.

C doesn't actively check whether any pointer arithmetic you perform results in something valid. That's part of what makes it fast. What the above passage from the standard is essentially telling you is not to go past the bounds of any array object.

Sign up to request clarification or add additional context in comments.

7 Comments

But strictly speaking, does C assume that any pointer points to an array of infinite length? Otherwise, how does the compiler know that this pointer, obtained from anywhere, points to an array and not to a single int object?
The compiler doesn't care. It operates under the assumption that you know what you're doing.
A pointer always points to a single object of the base type; that object may be the first element of an array, it may be an element in the middle an array, it may be a member of a struct, it may be a single standalone instance. There's no way to know from the pointer value alone. It's up to you, the programmer, to make sure you don't attempt to index outside the bounds of the array. p[0] will always be valid, as it's equivalent to *p. p[i] where i > 0 may not be. Somehow you need to manually track the size of the array.
"as long as the result is in the range of buffer + 0 to buffer + 10." buffer + 10 ? Did you mean buffer + 9 or did I miss something ? From my point of view, buffer + 10 is out of bound for static int buffer[10];
No, that's correct. A pointer can point to one element past the end of the array but cannot be dereferenced.
Well, color me surprised. I never knew it was possible, and I have a hard time understanding it's usefullness. Maybe when you loop over an array ? In any case, thanks a lot, I learned something today.
@Tom's May be useful to know: GCC bug 61502.
4

The C behavior for pointer arithmetic is defined in terms of the object that the pointer points to. It has nothing to do with the identifier that names the object.

In C, what people commonly call a variable is an identifier combined with an object. int c = 0; defines a variable with the name (identifier) c and memory for an int (an object). However, you can have objects with no identifiers (malloc provides memory in which you can create objects, using pointers), and you can have identifiers that do not refer to objects (a name might refer to a type or a function or something else).

The rules for pointer arithmetic, in C 2024 6.5.7, are entirely defined in terms of the pointed-to object (an array element and the array it is in). In this code:

int *p = malloc(10 * sizeof *p);
for (int i = 0; i < 10; ++i)
    p[i] = i*i;
int *q = p + 3;

the p + 3 is defined because p points to an element in an array of 10 int, even though that array has no name. (p points into the array, but there is no identifier for the array itself).

If one translation unit receives a pointer from another translation unit, all that matters for pointer arithmetic is whether the pointed-to object satisfies the requirements of pointer arithmetic. Whether names are known or even exist is irrelevant.

Further, the rules for pointer arithmetic say nothing about whether any information about the objects is present in the translation unit containing the arithmetic. If one translation unit creates an object, and another translation unit performs defined arithmetic on pointers related to that object, the C implementation must make the arithmetic work.

6 Comments

So, it turns out that the lines from the standard that I quoted say that it is the programmer who is responsible for whether the original pointer and the offset (by + and - operations) belong to the same object. Regardless of the number of intermediate pointers.
But why then, if I add some offset to the pointer, as a result of which I go beyond the first after the last element of the array, and will not access this memory, and then subtract the same offset to return back, is this undefined behavior? Because when going out of bounds, I can get an overflow?
Not every C implementation uses the flat address space most students today are accustomed to. Older computers had a variety of addressing schemes, including kludges added to old hardware to make more memory accessible. Pointers might be composed of base address and offsets that have to be combined in certain ways or of indicators of a memory segment or overlay, so the full physical address of the data was not contained directly in the pointer but had to be looked up in some table or register…
… Any defined arithmetic on a pointer to an allocated object was guaranteed to remain in bounds where the arithmetic would work (e.g., not exceed the offset portion of the pointer), but arithmetic outside the array might not work. If the offset field were exceeded, a carry into the base field could yield something that was not a valid address at all. Or the index into the auxiliary data might be invalid due to the carry, and attempting to use that result to fetch the auxiliary data could crash.
malloc is a strange example since the returned chunk has no type and it only works as an array thanks to a special rule in C23 7.24.4.1 "The pointer returned if the allocation succeeds is suitably aligned so that it can be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested. It can then be used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)" The space doesn't get associated with an effective type until written to.
Whereas an array with static size and declared type gives the compiler enough info to make all kinds of assumptions about bounds.
4

It's the programmer's responsibility to ensure that they don't go out of bounds, not the compiler's. Even in the same compilation unit, the compiler can't always tell the size of the object that a pointer points to, since it can be assigned conditionally:

int buffer_1[10];
int buffer_2[20];
int * buf;

if (some_condition) {
    buf = buffer_1;
} else {
    buf = buffer_2;
}

Even more common are memory allocations using malloc(), where the size is dynamic and can change over time if you use realloc().

So it doesn't matter where the pointer comes from. Pointer arithmetic is defined as long as you stay within the actual object it points to. Whether the compiler can verify this is irrelevant.

3 Comments

What did the authors of the standard mean by mentioning undefined behavior in paragraph 9? In fact, yes - the programmer is responsible for the "availability" of the memory where this pointer points. But we apparently have a variant when a situation directly quoted by the standard arises: the pointer "technically" points to memory inside the array, but logically the compiler does not know this. It has no information where this pointer is actually "looking".
It is logical if it strictly considers that it points to a single object of type int. Accordingly, having encountered the line buf[0] - there will be no UB. Having encountered buf[1] - the compiler can follow the logic that such access cannot be obtained, since this is beyond the boundaries of a single object as an array of length 1. And, accordingly, for example, delete part of the code or perform deplorable optimizations.
It doesn't need the information, undefined behavior is not something the compiler has to check for.
4

Perfectly fine to add offsets from 0..10 inclusively — that is, produce an expression of the pointer type pointing to any array element as well as one past the last element; and fine to read from the memory pointed to by the resulting pointer expression up to and including an offset of 9 — that is, from anywhere in the array but not past it.

There is an array there, which means your quoted point 9. does not apply; point 10. doesn't apply because it's not a single integer.

The reason for that funny provision that even just producing a pointer expression outside the array + 1 element even if we never read from that location is that funny architectures may produce an overflow — an illegal address — that traps when it occurs in a register. Typical architectures today don't do that any more though.

The rationale to allow an address that points one past the array is probably to facilitate simpler loops, e.g. for loops that break only when the address is already past the array. The implementation on a machine that may produce overflows is relatively easy: Just don't place an array (or variable) at the very end of a memory segment.

2 Comments

With separate translation, the compiler doesn't actually know where the pointer points - whether there is actually an array there or not.
Perfectly true, but irrelevant for the fact that there is, in fact, an array. As dbush and Barmar said in their answers, C usually does not know and cannot check array sizes. (This is, of course, one of the most common sources of bugs.) Even if it knew the size because the array is defined in the same translation unit it would still not generate code to check that access because C is designed to be as fast as possible. A compiler would also not refuse compilation. (Modern compilers would warn though, if the illegal access is obvious enough and the warning level is sufficient.)
3

That is, formally, at the point of calling int *buf = get_buf_addr(); the compiler does not know whether buf points to a single object (not an array) of type int, or to an array of such objects (and if to an array, how long is this array?).

Compiler does not have to know it. It is the programmer responsibility to keep the arithmetics in boudaries of an array + one after last lelemnt of an array.

3 Comments

Let me know, where is it written that the programmer is responsible? I literally read the lines of the standard and I can't understand how you interpret them like that =)
No, you simply do not understand how C works. C language does not do any boundary checks. Those cited paragraphs say when pointer arithmetic is valid and how to treat a pointer to a simple object, for example, of type int. Also undefined does not mean invalid. Some C standard undefined things can be defined in the scope of a particular implementation. In our case pointer arithmetic will be well defined on Cortex-M devices having linear address space
"C language does not do any boundary checks" is more like "C language does not require any boundary checks". An implementation may / may not employ memory checks of some degree.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.