I want to compare the performance of single-core CPU and multi-core CPU. I wrote a program and let it iterate 1000 times on a single-core CPU to see the running time. In the multi-core case, I used OpenCL to launch a kernel that where the code is same as that inside the iteration in the first case.
Considered multi-core could run 8 concurrent threads, theoretically, the running time of multi-core case should be above T(single-core)/8. But the results is that the T(multi-core) is almost 1/20 of T(single-core).
I wonder why this happen? Did OpenCL compiler do some optimization for multi-core CPU ?