I wrote a C program for Pi computation using OpenMP getting help from a book. I believe the performance of this program will depend on the processors used.
In my case, I used the environment variable to check the performance of parallelism by increasing the number of processors or threads (I am not sure what is correct ... please correct me)
OMP_NUM_THREADS
I have a quad core processor, so I used (where no_of_threads is changed from 1 to 10):
$ export OMP_NUM_THREADS=no_of_threads
the performance on running the program is:
1 --- 0m11.036s
2 --- 0m5.554s
3 --- 0m3.800s
4 --- 0m3.166s
5 --- 0m3.376s
8 --- 0m3.042s
10 --- 0m2.960s
15 --- 0m2.957s
I can understand the performance increase until 4, as there are 4 procesors on the system. But I am unable to understand the increase in performance even after the threads are more than 4. I am aware of the fact that each increased thread has an overhead, so why does the performance still increasing..
Can someone please explain this to me in detail.