2

I'm having trouble understanding what the difference between these two code snippets is:

// out is of type char* of size N*D
// N, D are of type int


for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    for (int j=0; j!=D; j++) {
        out[i*D + j] = 5;
    }
}

This code runs fine, even for very big data sets (N=100000, D=30000). From what I understand about pointer arithmetic, this should give the same result:

for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    char* out2 = &out[i*D];
    for (int j=0; j!=D; j++) {
        out2[j] = 5;
    }
}

However, the latter does not work (it freezes at index 143886 - I think it segfaults, but I'm not 100% sure as I'm not used to developing on windows) for a very big data set and I'm afraid I'm missing something obvious about how pointer arithmetic works. Could it be related to advancing char*?

EDIT: We have now established that the problem was an overflow of the index (i.e. (i*D + j) >= 2^32), so using uint64_t instead of int32_t fixed the problem. What's still unclear to me is why the first above case would run through, while the other one segfaults.

13
  • 2
    What does "does not work" mean? Commented Aug 21, 2013 at 20:24
  • 1
    you should show the declaration of out rather than a comment Commented Aug 21, 2013 at 20:26
  • 2
    I'm certainly hoping the out array size is at least D + D*N, or you're walking in memory you don't own. Commented Aug 21, 2013 at 20:28
  • 1
    You're doing this calculation on a 2.793GB array of chars? I rather think that's not optimal. Wait, this is O(N^2) with a flush in the outer loop? That'd take many many days to run Commented Aug 21, 2013 at 20:31
  • 1
    @soramimo: I assure you, the only explanation for the posted code being fast is because it's wrong. Commented Aug 21, 2013 at 20:46

3 Answers 3

4

N * D is 3e9; that doesn't fit in a 32 bit int.

Sign up to request clarification or add additional context in comments.

7 Comments

I started thinking about this just now, I think this is the reason!
I'm on a 64bit machine, I assume I can just fix this by using int64 for N, D instead of int32?
And how does this work: out[i*D + j] = 5;? It was my first guess until I saw that.
@soramimo If you are overflowing, you are overflowing with the same values, right? Blah, just try it, I am waiting to see the accepted answer here for 30 minutes :).
@NemanjaBoric: Yes, his code assigns the value five to the first 2e9 cells, and then assigns the value five to the first 1e9 again, leaving the last 1e9 unset.
|
1

When using N as size of array, why use int? does a negative value of an array has any logical meaning?

what do you mean "doesn't work"?

just think of pointers as addresses in memory and not as 'objects'.

char* 
void*
int*

are all pointers to memory addresses, and so are exactly the same, when are defined or passes into a function.

char * a;
int* b = (char*)a;
void* c = (void*)b;

a == b == c;

The difference is that when accessing a, a[i], the value that is retrieved is the next sizeof(*a) bytes from the address a.

And when using ++ to advance a pointer the address that the pointer is set to is advanced by

sizeof(pointer_type) bytes.

Example:

char* a = 1;
a++;

a is now 2.

((int*)a)++;

a is now 6.

Another thing:

char* a = 10;
char* b = a + 10;

&(a[10]) == b

because in the end

a[10] == *((char*)(a + 10))

so there should not be a problem with array sizes in your example, because the two examples are the same.

EDIT

Now note that there is not a negative memory address so accessing an array with a signed negative value will convert the value to positive.

int a = -5;
char* data;
data[a] == data[MAX_INT - 5]

For that reason it might be that (when using sign values as array sizes!) your two examples will actually not get the same result.

Comments

-1

Version 1

for (int i=0; i!=N; i++) // i starts at 0 and increments until N.  Note:  If you ever skip N, it will loop forever.  You should do < N or <= N instead
{
    if (i % 1000 == 0) // if i is a multiple of 1000
    {
        std::cout << "i=" << i << std::endl; // print i
    }

    for (int j=0; j!=D; j++) // same as with i, only j is going to D (same problem, should be < or <=)
    {
        out[i*D + j] = 5; // this is a way of faking a 2D array by making a large 1D array and doing the math yourself to offset the placement
    }
}

Version 2

for (int i=0; i!=N; i++) // same as before
{
    if (i % 1000 == 0) // same as before
    {
        std::cout << "i=" << i << std::endl; // same as before
    }

    char* out2 = &out[i*D]; // store the location of out[i*D]
    for (int j=0; j!=D; j++) 
    {
        out2[j] = 5; // set out[i*D+j] = 5;
    }
}

They are doing the same thing, but if out is not large enough, they will both behave in an undefined manner (and likely crash).

6 Comments

I'm not sure who is down-rating this, but it is accurate.
Hi Zac, out is actually N*D in size, so it should be large enough. The problem seems to be in fact that int32 isn't enough to hold the index.
That is A problem with it, but it is far from the ONLY problem with the posted code.
And given that your first statement is, "I'm having trouble understanding what the difference between these two code snippets is ..." the explanation that they are doing the same thing is correct. Both will screw up equally when you attempt to overflow an int32.
you're right, they both produce unexpected results, however in a different way. While using uint64_t instead of int32_t in fact solved my problem, the reason for the different behavior of the two cases still escapes me.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.