Why is dereferencing a null pointer undefined behaviour?

Question

According to ISO C++, dereferencing a null pointer is undefined behaviou. My curiosity is, why? Why has the standard decided to declare it undefined behavior? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?

Believe it or not, address 0 is usable on the x86, so at times, you may actually need to dereference a "null" pointer. — Earlz
– Earlz, Commented Jul 22, 2011 at 16:45
@Rob: it is not true. 6.5.3.2/4 says "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.", with a footnote that includes "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer". — Mike Seymour
– Mike Seymour, Commented Jul 22, 2011 at 17:11
The null pointer doesn't necessarily refer to the address 0. — user802003
– user802003, Commented Jul 22, 2011 at 18:53

Mark Ransom · Accepted Answer · 2019-03-09 22:07:32Z

49

Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.

edited Mar 9, 2019 at 22:07

answered Jul 22, 2011 at 16:53

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user541686 Over a year ago

You're assuming that NULL has to be special, and from what I understand, the OP's question is rather, why should it be special?

James McNellis Over a year ago

@Mehrdad: How does it assume that NULL is special? It's no more special with respect to dereferencing than an uninitialized pointer or a pointer that no longer points to an existent object.

user541686 Over a year ago

@James: Well because there's no reason the compiler has to check for NULL pointers (or other invalid pointers, for that matter)... it dereferences it like any other pointer if it wants to, if it's not special. Only if it were special would the compiler have to check.

Mark Ransom Over a year ago

@Mehrdad, that was the point of my second paragraph - NULL pointers are not special and should not be.

curiousguy Over a year ago

@Mehrdad "why should it be special?" Because it is a special value. There is no other valid pointer value that is not either the address of an object or one-past-the-end some array. OTOH, dereferencing a null pointer is not a special case.

Jerry Coffin · Accepted Answer · 2024-05-29 17:11:25Z

27

The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.

On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).

On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.

The camps wouldn't agree on what should happen, so they made it undefined behavior.

By the time the wrote the C++ standard, its being undefined behavior was already well established in C, and apparently nobody thought there was a good reason to create a conflict on this point so they kept the same.

edited May 29, 2024 at 17:11

answered Jul 22, 2011 at 17:00

Jerry Coffin

494k83 gold badges656 silver badges1.2k bronze badges

1 Comment

supercat Over a year ago

It's also worth noting that before C89 was published, it didn't impose any requirements with regard to any behaviors and yet many C implementations did define behaviors of many things. If some C compilers defined a behavior for some action, and some didn't, leaving the behavior undefined merely preserved the status quo. It's only recently that the Standard's failure to define things has been interpreted as an indication that no reasonable code--even code targeting platforms which defined the behavior before there was a C standard--should make use of anything not in the Standard.

Mike Seymour · Accepted Answer · 2011-07-22 17:12:51Z

11

The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.

C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.

Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.

edited Jul 22, 2011 at 17:12

answered Jul 22, 2011 at 16:56

Mike Seymour

256k30 gold badges467 silver badges658 bronze badges

6 Comments

Johannes Schaub - litb Over a year ago

That's not true. The defined behavior can simply be "you may dereference the null pointer as long as the value is not accessed. If the value of the resulting lvalue is accessed, behavior is undefined". This doesn't need any check.

Mike Seymour Over a year ago

@Johannes: Yes, you're right; I was interpreting "dereferencing" as "accessing the dereferenced value", which isn't strictly accurate.

Alok Save Over a year ago

@Johannes Schaub - litb: I posted excerpts from another answer of yours as an answer here, If you may want to add that as an answer of your own, please free to do so. I would delete the one marked community wiki if so.

Johannes Schaub - litb Over a year ago

@Als I don't do dupe posts. But I've upvoted yours. Thanks for spreading the words. Have fun :)

Alok Save Over a year ago

@Johannes Schaub - litb: Okay:) Anyways, I marked that community wiki while posting it!

|

3 revs · Accepted Answer · 2017-05-23 11:45:30Z

8

This answer from @Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.

The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.

Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).

The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.

Note:
^{Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.}

edited May 23, 2017 at 11:45

community wiki

3 revs
Alok Save

1 Comment

supercat Over a year ago

IMHO, many issues could have been resolved by better defining the meanings of "C objects" and addresses, recognizing that an N-byte C object has N+1 associated addresses, the first N of which each identify one byte and the last N of which each follow one byte. This definition could generalize to zero-byte objects, which have a single address that neither identifies nor follows any byte of storage, and may or may not match the address of any other zero-byte object.

Matthieu M. · Accepted Answer · 2011-07-22 16:55:00Z

5

The real question is, what behavior would you expect ?

A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.

So how do you get a good reference... from a pointer that points into the void ?

You do not. Thus the undefined behavior.

answered Jul 22, 2011 at 16:55

Matthieu M.

303k58 gold badges495 silver badges754 bronze badges

3 Comments

Mike Seymour Over a year ago

Throw an exception? Raise a signal? Call abort()? There are plenty of sensible things that could be defined; the question is, why leave it undefined?

Matthieu M. Over a year ago

@Mike Seymour: It seems that we did not interpreted the question similarly :) Checking (beforehand) the dereference would be costly. On the other hand, on Unix, the OS is performing the check anyway, so a signal handler could theorically be hooked up and perform one of the action you cite... but I do not think this is viable everywhere. Specifically on embedded platforms without OS. Specifying a behavior would cripple those platforms.

curiousguy Over a year ago

@MikeSeymour Throwing an exception where there is no throw is hardly a sensible thing to do. (Yes, you can draw conclusions about Java.)

Mark B · Accepted Answer · 2011-07-22 16:57:59Z

1

I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.

Thus to avoid any possible extra burden on compilers they left the behavior undefined.

answered Jul 22, 2011 at 16:57

Mark B

96.5k10 gold badges113 silver badges198 bronze badges

Comments

user541686 · Accepted Answer · 2011-07-22 16:58:58Z

1

Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".

If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.

answered Jul 22, 2011 at 16:58

user541686

213k133 gold badges563 silver badges935 bronze badges

Comments

Thomas Matthews · Accepted Answer · 2011-07-22 20:00:07Z

1

Here is a simple test & example:

Allocate a pointer:

int * pointer;

? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?

Marking the end of a linked list. In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?

The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.

answered Jul 22, 2011 at 20:00

Thomas Matthews

58.1k18 gold badges105 silver badges165 bronze badges

Comments

Matthijs Kooijman · Accepted Answer · 2020-03-19 11:48:34Z

Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.

So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.

I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.

In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:

matthijs@grubby:~$ cat test.c
unsigned foo () {
        unsigned *foo = 0;
        return *foo;
}

matthijs@grubby:~$ arm-none-eabi-gcc  -c test.c -Os && objdump -d test.o 

test.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <foo>:
   0:   e3a03000        mov     r3, #0
   4:   e5933000        ldr     r3, [r3]
   8:   e7f000f0        udf     #0

This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).

This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.

So, not so much an answer, but some extra perspective on the matter.

Lior Kogan · Accepted Answer · 2011-07-22 17:09:06Z

0

According to original C standard NULL can be any value - not necessarily zero.

The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function

There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.

(From http://c-faq.com/null/null1.html)

answered Jul 22, 2011 at 17:09

Lior Kogan

20.8k6 gold badges58 silver badges90 bronze badges

Comments

Dmitry Grigoryev · Accepted Answer · 2015-05-28 12:04:59Z

0

Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.

Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

There was even a discussion to make this behaviour part of the standard.

answered May 28, 2015 at 12:04

Dmitry Grigoryev

3,2331 gold badge29 silver badges61 bronze badges

2 Comments

supercat Over a year ago

There's no reason to expect the cast above to work in general, even if a null pointer was treated no differently from any other, since systems are not required to use any particular mapping between pointers and integers. A more interesting notion if there's a global char* x; somewhere that will never be modified would be better notion would be ((char*)&(((struct_type*)x)->member) - x). In all cases where the expression is defined, it will yield the (constant) offset of that member, and if the compiler can't tell if x holds a pointer to struct_type the most efficient way to...

supercat Over a year ago

...evaluate that expression would be to have it yield that constant directly without involving x at run-time.

supercat · Accepted Answer · 2024-05-29 22:32:19Z

If dereferencing a null pointer were classified as Implementation-Defined behavior, then a compiler that omitted an action like x = *p; in cases where the value of x never happens to be used might yield code that behaved contrary to specification. Characterizing a null-pointer dereference as UB, by contrast, would allow a compiler to omit the load of *p, and possibly the calculation of p itself.

Although it has become common for later language features to explicitly provide for the possibility of optimizing transforms affecting behavior, without characterizing as UB corner cases where that might happen, the characterization of null pointer dereference as Undefined Behavior predates that practice and there has never been any particular impetus to recharacterize it.

Rocky Pulley · Accepted Answer · 2011-07-22 16:53:06Z

-1

Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.

Mainly it is undefined because there is no logical way to handle it.

answered Jul 22, 2011 at 16:53

Rocky Pulley

23.4k21 gold badges72 silver badges110 bronze badges

4 Comments

Mike Seymour Over a year ago

You certainly can create and (attempt to) dereference a null pointer in C++.

Rocky Pulley Over a year ago

The point is you can't create a null reference, so how should it be defined when you try to use a back-door solution? Just undefine it.

curiousguy Over a year ago

@RockyTriton Dereferencing a pointer yields a lvalue, not a reference. In C++ there is no expression that has reference type.

curiousguy Over a year ago

@MikeSeymour The question is about the result of *nullpointer. That would be a "null lvalue", which is certainly what Rocky meant. A "null lvalue" would be a lvalue at "null address". "null address" is a contradiction.

Collectives™ on Stack Overflow

Why is dereferencing a null pointer undefined behaviour?

13 Answers 13

5 Comments

1 Comment

6 Comments

1 Comment

3 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

5 Comments

1 Comment

6 Comments

1 Comment

3 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related