26

Obviously, dereferencing an invalid pointer causes undefined behavior. But what about simply storing an invalid memory address in a pointer variable?

Consider the following code:

const char* str = "abcdef";
const char* begin = str;
if (begin - 1 < str) { /* ... do something ... */ }

The expression begin - 1 evaluates to an invalid memory address. Note that we don't actually dereference this address - we simply use it in pointer arithmetic to test if it is valid. Nonetheless, we still have to load an invalid memory address into a register.

So, is this undefined behavior? I never thought it was, since a lot of pointer arithmetic seems to rely on this sort of thing, and a pointer is really nothing but an integer anyway. But recently I heard that even the act of loading an invalid pointer into a register is undefined behavior, since certain architectures will automatically throw a bus error or something if you do that. Can anyone point me to the relevant part of the C or C++ standard which settles this either way?

2
  • 1
    According to C/C++ staqndard it is undefined behavior indeed. But, speaking frankly, I've never seen a real-world CPU/architecture on which the above is undefined behavior, i.e. machines that don't permit arbitrary pointer arithmetic. And I've seen quite a lot of architectures, including embedded microcontrollers. So, in my (humble) opinion, the code is ok, as long as you restrict yourself to modern non-esoteric architectures. Commented Sep 21, 2016 at 4:16
  • Can you pls extend the question - what if you have for cycle, where you traversing the array backwards? In this traversing, you definitely will need to check the element prior the first one, without dereferencing it. I had similar question but it was fo the element after the last one. Commented Sep 21, 2016 at 4:37

7 Answers 7

17

I have the C Draft Standard here, and it makes it undefined by omission. It defines the case of ptr + I at 6.5.6/8 for

  • If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.
  • Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.

Your case does not fit any of these. Neither is your array large enough to have -1 adjust the pointer to point to a different array element, nor does any of the result or original pointer point one-past-end.

Sign up to request clarification or add additional context in comments.

10 Comments

Is this undefined or unspecified behavior. I would expect the code to run and work and have no bad consequence though weather it entered the if branch would be unknowable (via the standard).
@Martin York: C++ standard defines this to be an Undefined behavior even if it is not dereferenced. I hope I have picked up the relevant quote in my post
It is behavior which could cause a hardware fault on hardware which validates the contents of pointer registers. As such, it is Undefined Behavior. It is possible and permissible for a particular implementation to specify what will happen if programs do various things that, per the standard, evoke Undefined Behavior. If an implementation conforms to its own spec, the behavior will then be well-defined. If the code is run on a different implementation which conforms to the C standard, but not to that particular implementation's specs, however, the program may fail in arbitrary ways.
@supercat is correct: on some CPUs, loading an invalid pointer into a register will by itself crash the program, so guaranteeing that will work would disable a lot of optimizations.
@Lorehead: The modern usage of the term "optimization" refers to the notion that a compiler should aggressively identify situations that would invoke UB, and conclude that variables cannot hold values that would make such situations arise. For example, given the code if (p != 0) doSomething(p); debug_log(*p); a "modern" optimizing compiler could conclude that it was safe to make the call to doSomething unconditional since code would invoke UB if "p" is null, even if on the target platform reading a null pointer would simply yield a meaningless value.
|
11

Your code is undefined behavior for a different reason:

the expression begin - 1 does not yield an invalid pointer. It is undefined behavior. You are not allowed to perform pointer arithmetics beyond the bounds of the array you're working on. So it is the subtraction itself that is invalid, and not the act of storing the resulting pointer.

19 Comments

The C99 Rationale (linked to in my answer) specifically mentions pointer arithmetic beyond the bound of the array as yielding invalid pointers.
If the expression was modified to (ptrdiff_t)begin - 1, would that still yield undefined behavior? Since ptrdiff_t has to be a signed integral type, I would think this would be okay.
A ptrdiff_t may only be calculated for two pointers into the same data object. The only exception to the "within the bounds of the array" is a pointer one beyond the end of the array.
@fizzer: I don't have the C++ standard here (formatted my computer a few days ago, and still need to grab that from my backups), but it states that this is undefined. I don't know if C does it differently, but I'd imagine that it's just that rationale deals with what actually happens (in reality, you just get an invalid pointer), but the standard is more strict and says "it's a nonsensical operation, it is undefined".
@Channel72: Yes, as long as the following are all true: (1) sizeof(ptrdiff_t) >= sizeof(void*) (this isn't necessarily guaranteed), (2) the result of casting begin to the signed integer type ptrdiff_t doesn't result in the minimum value representable by that type (if it does, then the subtraction will result in undefined behavior), and that (3) the implementation defines conversion of a pointer to an integer consistently so that you can compare the result of comparing the result of this expression with the result of (ptrdiff_t)str and get a meaningful result (also not guaranteed).
|
8

Some architectures have dedicated registers for holding pointers. Putting the value of an unmapped address into such a register is allowed to crash. Integer overflow/underflow is allowed to crash. Because C aims to work on a broad variety of platforms, pointers provide a mechanism for safely programming unsafe circuits.

If you know you won't be running on exotic hardware with such finicky characteristics, you don't need to worry about what is undefined by the language. It is well-defined by the platform.

Of course, the example is poor style and there isn't a good reason to do it.

9 Comments

The fact that it's well-defined on a platform does not make it well-defined on all implementations targeting that platform. Compilers whose writers who are more interested in "optimization" than in supporting low-level programming cannot be relied upon to behave reliably with such code even if the underlying platform would.
@supercat That's a good point, and you're technically correct. In practice, though, when a compiler get so aggressive that ptr = arr - 1; becomes a no-op (or crash, or…), its users just might get so upset that they go find other compilers. While the standard allows it, such behavior is so subtly pathological and such computations are so common that it's seldom a viable solution.
Compilers like gcc and clang seem popular, even though their behaviors would have been considered outrageous in saner times. One of the reasons the authors of the Standard made short unsigned types promote as signed was, according to the rationale, that the majority of then-current implementations would process something like unsigned mul_mod_65535(unsigned short x, unsigned short y) { return (x*y) & 0xFFFF; } in the logical fashion even if x*y was larger than INT_MAX. GCC, however, will sometimes "optimize" that function in ways that break.
@supercat Yes, that's another perennial source of complaints. Still, it's easier to catch that sort of bug. Out-of-bounds computations are sometimes hard to avoid and difficult to see in the code. C++ is introducing std::launder to selectively bless such values, but actually specifying that function has been about as weird as you might expect.
@Evg m68k has address registers and I’m not 100% sure but the unmapped address load comment was probably referring to IA64.
|
4

Any use of an invalid pointer yields undefined behaviour. I don't have the C Standard here at work, but see 'invalid pointers' in the Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

4 Comments

If that's the case, couldn't you just cast all your pointers to a ptrdiff_t when doing pointer arithmetic? In other words, if I changed my above code sample to read if ((ptrdiff_t)begin - 1) would that no longer be undefined behavior?
Not undefined behaviour, but the result is implementation defined. That is, your implementation will document some reasonable behaiour, but it will not be portable, and may not be useful.
The comp.lang.c FAQ addresses this: c-faq.com/ptrs/int2ptr.html. Like I said, I don't have the Standard to hand.
Note that ptrdiff_t will hold the difference between pointers, not pointers themselves. This is not the same thing.
2

$5.7/6 - "Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.75)"

Summary, it is undefined even if you do not dereference the pointer.

3 Comments

That text concerns subtraction of a pointer from a pointer; the OP is subtracting an integer from a pointer.
@James McNellis: That's about pointer arithmetic I guess. Ultimately it's about the resultant pointer value
I am unsure about your reasoning, subtracting two pointers from different arrays you might in fact have issues because the pointers point to different memory zones (think far / near memory in 16bits architecture). There is nothing here about meddling with the pointers themselves, in fact it is quite common to use the upper bits of 64-bits pointers to store additional flags.
1

The correct answers have been given years ago, but I find it interesting that the C99 rationale [sec. 6.5.6, last 3 paragraphs] explains why the standard endorses adding 1 to a pointer that points to the last element of an array (p+1):

An important endorsement of widespread practice is the requirement that a pointer can always be incremented to just past the end of an array, with no fear of overflow or wraparound

and why p-1 is not endorsed:

In the case of p-1, on the other hand, an entire object would have to be allocated prior to the array of objects that p traverses, so decrement loops that run off the bottom of an array can fail. This restriction allows segmented architectures, for instance, to place objects at the start of a range of addressable memory.

So if the pointer p points to an object at the start of a range of addressable memory, which is endorsed by this comment, then p-1 would generate an underflow.

Note that integer overflow is the standard's example for undefined behavior [sec. 3.4.3], as it depends on the translation environment and the operating environment. I believe it is easy to see that this dependence on the environment extends to pointer underflow.

This is why the standard explicitly makes it undefined behavior [in 6.5.6/8], as noted by other answers here. To cite that sentence:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

See also [sec. 6.3.2.3, last 4 paragraphs] of the C99 rationale, which gives a more detailed description of how invalid pointers can be generated, and what effects that may have.

Comments

0

Yes, it's undefined behavior. See the accepted answer to this closely related question. Assigning an invalid pointer to a variable, comparing an invalid pointer, casting an invalid pointer triggers undefined behavior.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.