According to ISO C++, dereferencing a null pointer is undefined behaviou. My curiosity is, why? Why has the standard decided to declare it undefined behavior? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?
-
15Believe it or not, address 0 is usable on the x86, so at times, you may actually need to dereference a "null" pointer.Earlz– Earlz2011-07-22 16:45:33 +00:00Commented Jul 22, 2011 at 16:45
-
7But if not undefined, then what should the behavior be?drb– drb2011-07-22 16:46:25 +00:00Commented Jul 22, 2011 at 16:46
-
10@drb: nasal demons for instance...Marcus Borkenhagen– Marcus Borkenhagen2011-07-22 16:47:44 +00:00Commented Jul 22, 2011 at 16:47
-
7@Rob: it is not true. 6.5.3.2/4 says "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.", with a footnote that includes "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer".Mike Seymour– Mike Seymour2011-07-22 17:11:35 +00:00Commented Jul 22, 2011 at 17:11
-
6The null pointer doesn't necessarily refer to the address 0.user802003– user8020032011-07-22 18:53:38 +00:00Commented Jul 22, 2011 at 18:53
13 Answers
Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.
It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.
5 Comments
NULL has to be special, and from what I understand, the OP's question is rather, why should it be special?NULL is special? It's no more special with respect to dereferencing than an uninitialized pointer or a pointer that no longer points to an existent object.NULL pointers (or other invalid pointers, for that matter)... it dereferences it like any other pointer if it wants to, if it's not special. Only if it were special would the compiler have to check.The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.
On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).
On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.
The camps wouldn't agree on what should happen, so they made it undefined behavior.
By the time the wrote the C++ standard, its being undefined behavior was already well established in C, and apparently nobody thought there was a good reason to create a conflict on this point so they kept the same.
1 Comment
The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.
C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.
Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.
6 Comments
This answer from @Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.
The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.
Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).
The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.
Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.
1 Comment
The real question is, what behavior would you expect ?
A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.
So how do you get a good reference... from a pointer that points into the void ?
You do not. Thus the undefined behavior.
3 Comments
abort()? There are plenty of sensible things that could be defined; the question is, why leave it undefined?throw is hardly a sensible thing to do. (Yes, you can draw conclusions about Java.)I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.
Thus to avoid any possible extra burden on compilers they left the behavior undefined.
Comments
Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".
If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.
Comments
Here is a simple test & example:
Allocate a pointer:
int * pointer;
? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?
- Marking the end of a linked list.
In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?
The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.
Comments
Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.
So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.
I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.
In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:
matthijs@grubby:~$ cat test.c
unsigned foo () {
unsigned *foo = 0;
return *foo;
}
matthijs@grubby:~$ arm-none-eabi-gcc -c test.c -Os && objdump -d test.o
test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a03000 mov r3, #0
4: e5933000 ldr r3, [r3]
8: e7f000f0 udf #0
This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).
This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.
So, not so much an answer, but some extra perspective on the matter.
Comments
According to original C standard NULL can be any value - not necessarily zero.
The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function
There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.
Comments
Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.
Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
There was even a discussion to make this behaviour part of the standard.
2 Comments
char* x; somewhere that will never be modified would be better notion would be ((char*)&(((struct_type*)x)->member) - x). In all cases where the expression is defined, it will yield the (constant) offset of that member, and if the compiler can't tell if x holds a pointer to struct_type the most efficient way to...x at run-time.If dereferencing a null pointer were classified as Implementation-Defined behavior, then a compiler that omitted an action like x = *p; in cases where the value of x never happens to be used might yield code that behaved contrary to specification. Characterizing a null-pointer dereference as UB, by contrast, would allow a compiler to omit the load of *p, and possibly the calculation of p itself.
Although it has become common for later language features to explicitly provide for the possibility of optimizing transforms affecting behavior, without characterizing as UB corner cases where that might happen, the characterization of null pointer dereference as Undefined Behavior predates that practice and there has never been any particular impetus to recharacterize it.
Comments
Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.
Mainly it is undefined because there is no logical way to handle it.
4 Comments
*nullpointer. That would be a "null lvalue", which is certainly what Rocky meant. A "null lvalue" would be a lvalue at "null address". "null address" is a contradiction.