pointer arithmetic in C++ using char*

Question

I'm having trouble understanding what the difference between these two code snippets is:

// out is of type char* of size N*D
// N, D are of type int


for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    for (int j=0; j!=D; j++) {
        out[i*D + j] = 5;
    }
}

This code runs fine, even for very big data sets (N=100000, D=30000). From what I understand about pointer arithmetic, this should give the same result:

for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    char* out2 = &out[i*D];
    for (int j=0; j!=D; j++) {
        out2[j] = 5;
    }
}

However, the latter does not work (it freezes at index 143886 - I think it segfaults, but I'm not 100% sure as I'm not used to developing on windows) for a very big data set and I'm afraid I'm missing something obvious about how pointer arithmetic works. Could it be related to advancing char*?

EDIT: We have now established that the problem was an overflow of the index (i.e. (i*D + j) >= 2^32), so using uint64_t instead of int32_t fixed the problem. What's still unclear to me is why the first above case would run through, while the other one segfaults.

you should show the declaration of out rather than a comment — a.lasram
– a.lasram, Commented Aug 21, 2013 at 20:26
I'm certainly hoping the out array size is at least D + D*N, or you're walking in memory you don't own. — WhozCraig
– WhozCraig, Commented Aug 21, 2013 at 20:28
You're doing this calculation on a 2.793GB array of chars? I rather think that's not optimal. Wait, this is O(N^2) with a flush in the outer loop? That'd take many many days to run — Mooing Duck
– Mooing Duck, Commented Aug 21, 2013 at 20:31
@soramimo: I assure you, the only explanation for the posted code being fast is because it's wrong. — Mooing Duck
– Mooing Duck, Commented Aug 21, 2013 at 20:46

Alan Stokes · Accepted Answer · 2013-08-21 20:31:11Z

4

N * D is 3e9; that doesn't fit in a 32 bit int.

answered Aug 21, 2013 at 20:31

Alan Stokes

19k3 gold badges49 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

soramimo Over a year ago

I started thinking about this just now, I think this is the reason!

soramimo Over a year ago

I'm on a 64bit machine, I assume I can just fix this by using int64 for N, D instead of int32?

Nemanja Boric Over a year ago

And how does this work: out[i*D + j] = 5;? It was my first guess until I saw that.

Nemanja Boric Over a year ago

@soramimo If you are overflowing, you are overflowing with the same values, right? Blah, just try it, I am waiting to see the accepted answer here for 30 minutes :).

Mooing Duck Over a year ago

@NemanjaBoric: Yes, his code assigns the value five to the first 2e9 cells, and then assigns the value five to the first 1e9 again, leaving the last 1e9 unset.

|

TomF · Accepted Answer · 2013-08-21 21:22:02Z

When using N as size of array, why use int? does a negative value of an array has any logical meaning?

what do you mean "doesn't work"?

just think of pointers as addresses in memory and not as 'objects'.

char* 
void*
int*

are all pointers to memory addresses, and so are exactly the same, when are defined or passes into a function.

char * a;
int* b = (char*)a;
void* c = (void*)b;

a == b == c;

The difference is that when accessing a, a[i], the value that is retrieved is the next sizeof(*a) bytes from the address a.

And when using ++ to advance a pointer the address that the pointer is set to is advanced by

sizeof(pointer_type) bytes.

Example:

char* a = 1;
a++;

a is now 2.

((int*)a)++;

a is now 6.

Another thing:

char* a = 10;
char* b = a + 10;

&(a[10]) == b

because in the end

a[10] == *((char*)(a + 10))

so there should not be a problem with array sizes in your example, because the two examples are the same.

EDIT

Now note that there is not a negative memory address so accessing an array with a signed negative value will convert the value to positive.

int a = -5;
char* data;
data[a] == data[MAX_INT - 5]

For that reason it might be that (when using sign values as array sizes!) your two examples will actually not get the same result.

Zac Howland · Accepted Answer · 2013-08-21 20:37:13Z

-1

Version 1

for (int i=0; i!=N; i++) // i starts at 0 and increments until N.  Note:  If you ever skip N, it will loop forever.  You should do < N or <= N instead
{
    if (i % 1000 == 0) // if i is a multiple of 1000
    {
        std::cout << "i=" << i << std::endl; // print i
    }

    for (int j=0; j!=D; j++) // same as with i, only j is going to D (same problem, should be < or <=)
    {
        out[i*D + j] = 5; // this is a way of faking a 2D array by making a large 1D array and doing the math yourself to offset the placement
    }
}

Version 2

for (int i=0; i!=N; i++) // same as before
{
    if (i % 1000 == 0) // same as before
    {
        std::cout << "i=" << i << std::endl; // same as before
    }

    char* out2 = &out[i*D]; // store the location of out[i*D]
    for (int j=0; j!=D; j++) 
    {
        out2[j] = 5; // set out[i*D+j] = 5;
    }
}

They are doing the same thing, but if out is not large enough, they will both behave in an undefined manner (and likely crash).

answered Aug 21, 2013 at 20:37

Zac Howland

15.9k1 gold badge28 silver badges43 bronze badges

6 Comments

Zac Howland Over a year ago

I'm not sure who is down-rating this, but it is accurate.

soramimo Over a year ago

Hi Zac, out is actually N*D in size, so it should be large enough. The problem seems to be in fact that int32 isn't enough to hold the index.

Zac Howland Over a year ago

That is A problem with it, but it is far from the ONLY problem with the posted code.

Zac Howland Over a year ago

And given that your first statement is, "I'm having trouble understanding what the difference between these two code snippets is ..." the explanation that they are doing the same thing is correct. Both will screw up equally when you attempt to overflow an int32.

soramimo Over a year ago

you're right, they both produce unexpected results, however in a different way. While using uint64_t instead of int32_t in fact solved my problem, the reason for the different behavior of the two cases still escapes me.

|

Collectives™ on Stack Overflow

pointer arithmetic in C++ using char*

3 Answers 3

7 Comments

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related