Why are C++ array index values signed and not built around the size_t type (or am I wrong in that)?

Question

It's getting harder and harder for me to keep track of the ever-evolving C++ standard but one thing that seems clear to me now is that array index values are meant to be integers (not long long or size_t or some other seemingly more appropriate choice for a size). I've surmised this both from the answer to this question (Type of array index in C++) and also from practices used by well established C++ libraries (like Qt) which also use a simple integer for sizes and array index operators. The nail in the coffin for me is that I am now getting a plethora of compiler warnings from MSVC 2017 stating that my const unsigned long long (aka const size_t) variables are being implicitly converted to type const int when used as an array index.

The answer given by Mat in the question linked above quotes the ISO C++ standard draft n3290 as saying

it shall be an integral constant expression and its value shall be greater than zero.

I have no background in reading these specs and precisely interpreting their language, so maybe a few points of clarification:

Does an "integral constant expression" specifically forbid things like long long which to me is an integral type, just a larger sized one?
Does what they're saying specifically forbid a type that is tagged unsigned like size_t?

If all I am seeing here is true, an array index values are meant to be signed int types, why? This seems counter-intuitive to me. The specs even state that the expression "shall be greater than zero" so we're wasting a bit if it is signed. Sure, we still might want to compare the index with 0 in some way and this is dangerous with unsigned types, but there should be cheaper ways to solve that problem that only waste a single value, not an entire bit.

Also, with registers ever widening, a more future-proof solution would be to allow larger types for the index (like long long) rather than sticking with int which is a problematic type historically anyways (changing its size when processors changed to 32 bits and then not when they went to 64 bits). I even see some people talking about size_t anecdotally like it was designed to be a more future-proof type for use with sizes (and not JUST the type returned in service of the sizeof operator). But of course, that might be apocryphal.

I just want to make sure my foundational programming understanding here is not flawed. When I see experts like the ISO C++ group doing something, or the engineers of Qt, I give them the benefit of the doubt that they have a good reason! For something like an array index, so fundamental to programming, I feel like I need to know what that reason is or I might be missing something important.

And while an academic and language-lawyer question like this might be fun, you might be better helped by asking about your specific problem first of all. — Some programmer dude
– Some programmer dude, Commented Jul 27, 2018 at 17:34
Could you share a minimal code sample that shows these errors? You shouldn't be getting any and you don't using g++: coliru.stacked-crooked.com/a/39b98bc1a5773cb1 — NathanOliver
– NathanOliver, Commented Jul 27, 2018 at 17:39
@JosephQuinsey Thank you for that link! A very helpful point of view there. — OllieBrown
– OllieBrown, Commented Jul 28, 2018 at 18:17

NathanOliver · Accepted Answer · 2018-07-27 17:51:02Z

Looking at [expr.sub]/1 we have

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.67 The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise. The expression E1 is sequenced before the expression E2.

^{emphasis mine}

So, the index of the subscript operator need to be a unscoped enumeration or integral type. Looking in [basic.fundamental] we see that standard integer types are signed char, short int, int, long int, and long long int, and their unsigned counterparts.

So any of the standard integer types will work and any other integer type, like size_t, will be valid types to use as an array index. The supplied value to the subscript operator can even have a negative value, so long as that value would access a valid element.

Cory Kramer · Accepted Answer · 2018-07-27 17:33:42Z

1

I would argue that the standard library API prefers that indexes be unsigned type. If you look at the documentation for std::size_t it notes

When indexing C++ containers, such as std::string, std::vector, etc, the appropriate type is the member typedef size_type provided by such containers. It is usually defined as a synonym for std::size_t.

This is reinforced when looking at signatures for functions such as std::vector::at

reference       at( size_type pos );
const_reference at( size_type pos ) const;

answered Jul 27, 2018 at 17:33

Cory Kramer

119k19 gold badges176 silver badges233 bronze badges

3 Comments

Rakete1111 Over a year ago

std::span::size returns a signed type :(

Richard Critten Over a year ago

IIRC it's been acknowledged (by some on the standards committee) as a mistake to have size types unsigned due to the many potential pitfalls when mixing signed and unsigned types. Looking for the reference for this ...

Richard Critten Over a year ago

IIRC follow up: github.com/ericniebler/stl2/issues/182 not the link is was trying to remember but does contain some informed argument.

R Sahu · Accepted Answer · 2018-07-27 17:53:08Z

I think you are confusing two types:

The first type is the type of object/value that can be use to define the size of an array. Unfortunately, the question that you link to uses index where they should have used array size. This must be an expression that must be evaluated at compile time and its value must be greater than zero.
```
int array[SomeExpression]; // Valid as long as SomeExpression can be evaluated 
                           // at compile time and the value is greater than zero.
```
The second type is the type of object/value that can be used to access an array. Given the above array,
```
array[i] = SomeValue; // i is an index to access the array
```
i does not need to be evaluated at compile time, i must be in the range [0, SomeExpression-1]. However it is possible to use negative values as the index to access an array. Since array[i] is evaluated as *(array+i) (ignoring for the time being the overloaded operator[] functions), i can be a negative value if array happens to point to the middle of an array. My answer to another SO post has more information on the subject.

Just as an aside, since array[i] is evaluated as *(array+i), it is legal to use i[array] and is the same as array[i].

Collectives™ on Stack Overflow

Why are C++ array index values signed and not built around the size_t type (or am I wrong in that)?

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related