OpenMP optimizations?

Question

I can't figure out why the performance of this function is so bad. I have a core 2 Duo machine and I know its only creating 2 trheads so its not an issue of too many threads. I expected the results to be closer to my pthread results.

these are my compilation flags (purposely not doing any optimization flags) gcc -fopenmp -lpthread -std=c99 matrixMul.c -o matrixMul

These are my results

Sequential matrix multiply: 2.344972
Pthread    matrix multiply: 1.390983
OpenMP     matrix multiply: 2.655910
CUDA       matrix multiply: 0.055871
Pthread Test PASSED
OpenMP  Test PASSED
CUDA    Test PASSED

void openMPMultiply(Matrix* a, Matrix* b, Matrix* p)
{
  //int i,j,k;
  memset(*p, 0, sizeof(Matrix));
  int   tid, nthreads, i, j, k, chunk;
  #pragma omp parallel shared(a,b,p,nthreads,chunk) private(tid,i,j,k)
  {
        tid = omp_get_thread_num();
        if (tid == 0)
        {
          nthreads = omp_get_num_threads();
        }
        chunk = 20;
        //   #pragma omp parallel for private(i, j, k)
        #pragma omp for schedule (static, chunk)
        for(i = 0; i < HEIGHT; i++)
        {
          //printf("Thread=%d did row=%d\n",tid,i);
                for(j = 0; j < WIDTH; j++)
                {
                        //#pragma omp parallel for
                        for(k = 0; k < KHEIGHT ; k++)
                                (*p)[i][j] += (*a)[i][k] * (*b)[k][j];
                }
        }
  }
}

Thanks for any help.

How about the results if you apply optimization to this function? — kamae
– kamae, Commented Jul 21, 2011 at 17:32
Results with -O2 optimization \nSequential matrix multiply: 0.787335 \nPthread matrix multiply: 0.524749 \nOpenMP matrix multiply: 1.055698 — paTest
– paTest, Commented Jul 21, 2011 at 17:38
It might be getting killed by having so many shared variables (depending on gccs implementation of omp). Might be good to see you pthreads version, as it could be that it overlooks something that allows it to run faster — Necrolis
– Necrolis, Commented Jul 21, 2011 at 21:05

minjang · Accepted Answer · 2011-07-23 01:03:18Z

As matrix multiplication is an embarrassingly parallel, its speedup should be near 2 on a dual core. Matrix multiplication even typically shows a superlinear speedup (greater than 2 on a dual core) due to reduced cache misses. I don't see obvious mistakes by looking your code, but something's wrong. Here is my suggestions:

Just double-check the number of worker threads. In your case, 2 threads should be created. Or, try to set by calling omp_set_num_threads. Also, see whether 2 cores are fully utilized (i.e., 100% CPU utilization on Windows, 200% on Linux).
Clean up your code by removing unnecessary nthreads and chunk. These can be prepared outside of the parallel section. But, even if so, it shouldn't hurt speedup.
Are matrices square (i.e., HEIGHT == WIDTH == KHEIGHT)? If it's not a square matrix, then there could be workload imbalance that can hurt speedup. But, given the speedup of pthread (around 1.6, which is also odd to me), I don't think there's too much workload imbalance.
Try to use a default static scheduling: don't specify chunk, just write #pragma omp for.
My best guess is that the structure of Matrix could be problematic. What exactly Matrix looks like? In worst case, false sharing could significantly hurt performance. But, in such simple matrix multiplication, false sharing shouldn't be a big problem. (If you don't know the detail, I may explain more details).
Although you commented out, never put #pragma omp parallel for at for-k, which causes nested parallel loop. In matrix multiplication, it's absolutely wasteful as the outer most loop is parallelizable.

Finally, try to run the following very simple OpenMP matrix multiplication code, and see the speedup:

double A[N][N], B[N][N], C[N][N];
#pragma omp parallel for
for (int row = 0; row < N; ++row)
  for (int col = 0; col < N; ++col)
    for (int k = 0; k < N; ++k)
      C[row][col] += A[row][k]*B[k][col];

Collectives™ on Stack Overflow

OpenMP optimizations?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related