5

I wrote a C program for Pi computation using OpenMP getting help from a book. I believe the performance of this program will depend on the processors used.

In my case, I used the environment variable to check the performance of parallelism by increasing the number of processors or threads (I am not sure what is correct ... please correct me)

OMP_NUM_THREADS

I have a quad core processor, so I used (where no_of_threads is changed from 1 to 10):

$ export OMP_NUM_THREADS=no_of_threads

the performance on running the program is:

1 --- 0m11.036s

2 --- 0m5.554s

3 --- 0m3.800s

4 --- 0m3.166s

5 --- 0m3.376s

8 --- 0m3.042s

10 --- 0m2.960s

15 --- 0m2.957s

I can understand the performance increase until 4, as there are 4 procesors on the system. But I am unable to understand the increase in performance even after the threads are more than 4. I am aware of the fact that each increased thread has an overhead, so why does the performance still increasing..

Can someone please explain this to me in detail.

1 Answer 1

3

You probably have a processor that supports hardware threads (Intel calls this hyper-threading).

What this basically means is that your cores each have two instruction caches and can thus execute two interweaving threads more efficiently than usually. This is especially noticeable if the threads often have to wait for memory: usually, a core just stalls while waiting for memory1. A core that supports hyper-threading can instead execute instructions from the other thread during that wait.


1 Not taking into account instruction reordering and prefetching.

Sign up to request clarification or add additional context in comments.

1 Comment

+1. Kumar, you're absolutely right that multiple threads per core has overhead, but as long as that overhead is less than the performance gain from having multiple threads (say) waiting for memory, there can still be a net win. Hyperthreading greatly reduces the overhead for a second thread per core, so up to 8 you do fairly well; beyond that, it looks there are still minor gains. Note that this is an unusual case; normally you would write (and the compiler would optimize) code so that it doesn't spend much time waiting for memory, so there would be less or no gain from this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.