2

Let's say I have an array:

mytype_t array[1000];

I set up a pointer into the array:

mytype_t * pointer = &array[317];

Assuming array[317] (and surrounds) are already present in CPU cache, will dereferencing the pointer have exactly the same cost as dereferencing the array by index?

Under what circumstances, if any, would the pointer approach be slower?

EDIT On request of FUZxxl &Olaf, the primary architectures under consideration are Intel desktop architectures and ARM, though others are likely in the future (gaming platforms like Sony's CBEA, IBM Broadway for Wii); compiler is GCC 5.1.0.

6
  • 2
    CPU and compiler-dependent, but typically no difference in the general case. If you have a genuine performance problem then you need to benchmark/profile. Commented Sep 8, 2015 at 10:05
  • 1
    If the pointer itself needs to be loaded first, you may need an additional load compared to using the hardcoded address of the array. Commented Sep 8, 2015 at 10:16
  • This has nothing to do with the CPU cache. Why do you think that? Any access to that location would need to load the cache line. Commented Sep 8, 2015 at 10:21
  • The question is too broad to answer here. It not only depends on the given information, but also the whole code, how the compiler allocates registers, the compiler quality, etc. I recommend to profile the two variants and have a look at the generated code (although longer assembler code might still be faster). The outcome might change with the compiler version or CPU-microarchitecture. Commented Sep 8, 2015 at 10:37
  • @Olaf, I appreciate the heads-up. Really, I don't need a perfect answer, just a general case I can go on would be appreciated, with your caveat borne in mind. If FUZxxl's answer is sufficient in this regard, I can accept it. Otherwise I shall wait. Commented Sep 8, 2015 at 10:39

1 Answer 1

3

It's probably not going to make a difference.

In the expressions *pointer and array[317], the same address is dereferenced. Address computations are typically done in an ALU set aside just for that purpose in the CPU and don't cost much, especially (on x86) when the object size in the array is one of 1, 2, 4, or 8 bytes or when the index is constant.

Furthermore, it's likely that the compiler won't actually generate pointer and instead decides to re-compute the address stored in it every time you dereference pointer as doing so saves the compiler the cost of allocating an extra register for pointer.

There aren't any processors I know of that speculatively prefetch data based on register contents, but there might very well be some in the future, but if that is going to be the case, the compiler is surely going to optimize your code for this.

Sign up to request clarification or add additional context in comments.

16 Comments

As given by the first sentence, the answer is wrong. The answer is dependent on the CPU architecture and compiler-generated code.
@Olaf Which is why I say “typically.” Yes, you could argue that there might be exotic architectures where *pointer and array[317] do not dereference the same address, but this is clearly a question motivated from practice, not from language lawyering (which I love as you know).
@ArcaneEngineer: "And no, I need not add anything ..." Great attitude! You have to, if you want to get a comprehensive answer.
@FUZxxl ...and a question that is too specific, like perhaps 90% of the questions on SO, are practically useless as a resource for anyone other than the OP, in the future. You tell me which is the bigger problem. :) You still have not provided any specific questions. I've already outlined in my comment above, what I'm trying to do. I appreciate your time, but mine is also valuable and I will leave it here if no specific requests for clarifications are forthcoming.
@ArcaneEngineer If you're interested in details, indirect addressing can be slower than direct. (i.e. Stores on Haswell can use port 7 only if it's reg + fixed offset, not reg + reg, or reg + reg*scale) But you have virtually no control over that in C. Modern compilers are fully capable of converting code to and from direct/indirect addressing. Even if you know which is the faster one for a particular scenario, it's a cat-and-mouse game to get the compiler to generate one over the other.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.