5

I'm having trouble using the #pragma omp parallel for

Basically I have several hundred DNA sequences that I want to run against an algorithm called NNLS.

I figured that doing it in parallel would give me a pretty good speed up, so I applied the #pragma operators.

When I run it sequentially there is no issue, the results are fine, but when I run it with #pragma omp parallel for I get a segfault within the algorithm (sometimes at different points).

#pragma omp parallel for
for(int i = 0; i < dir_count; i++ ) {

  int z = 0;
  int w = 0;
  struct dirent *directory_entry;
  char filename[256];

  directory_entry = readdir(input_directory_dh);

  if(strcmp(directory_entry->d_name, "..") == 0 || strcmp(directory_entry->d_name, ".") == 0) {
    continue;
  }

  sprintf(filename, "%s/%s", input_fasta_directory, directory_entry->d_name);

  double *count_matrix = load_count_matrix(filename, width, kmer);

  //normalize_matrix(count_matrix, 1, width)
  for(z = 0; z < width; z++) 
    count_matrix[z] = count_matrix[z] * lambda;

  // output our matricies if we are in debug mode
  printf("running NNLS on %s, %d, %d\n", filename, i, z);
  double *trained_matrix_copy = malloc(sizeof(double) * sequences * width);
  for(w = 0; w < sequences; w++) {
    for(z = 0; z < width; z++) {
      trained_matrix_copy[w*width + z] = trained_matrix[w*width + z];
    }
  } 

  double *solution = nnls(trained_matrix_copy, count_matrix, sequences, width, i);


  normalize_matrix(solution, 1, sequences);
  for(z = 0; z < sequences; z++ )  {
    solutions(i, z) = solution[z]; 
  }

  printf("finished NNLS on %s\n", filename);

  free(solution);
  free(trained_matrix_copy);
}

gdb always exits at a different pint in my thread, so I can't figure out what is going wrong.

What I have tried:

  • allocating a copy of each matrix, so that they would not be writing on top of eachother
  • using a mixture of private/shared operators for the #pragma piece
  • using different input sequences
  • writing out my trained_matrix and count_matrix prior to calling NNLS, ensuring that they look OK. (they do!)

I'm sort of out of ideas. Does anyone have some advice?

4 Answers 4

3

Solution: make sure not use static variables in your function when multithreading (damned f2c translator)

Sign up to request clarification or add additional context in comments.

Comments

1

Defining "#pragma omp parallel for" is not going to give you what you want. Based on the algorithm you have, you must have a solid plan on which variables are going to shared and which ones going to private among the processors.

Looking at this link should give you a quick start on how to correctly share the work among the threads.

Based on your statement "I get a segfault within the algorithm (sometimes at different points)", I would think there is a race condition between the threads or improper initialization of variables.

1 Comment

from what I understand, variables declared locally are automatically private. Even adding shared(trained_matrix) doesn't solve the problem. Thank you for the quick-sheet it's really awesome!
1

Function readdir is not thread safe. To quote the Linux man page for readdir(3):

The data returned by readdir() may be overwritten by subsequent  calls  to  readdir()
for the same directory stream.

Consider putting the calls to readdir inside a critical section. Before leaving the critical section, copy the filename returned from readdir() to a local temporary variable, since the next thread to enter the critical section may overwrite it.

Also consider protecting your output operations with a critical section too, otherwise the output from different threads might be jumbled together.

1 Comment

gdb does not indicate that the error is within readdir, and the files are getting read properly, the error is actually within the algorithm call
0

A very possible reason is the stack limit. As MutantTurkey mentioned, if you have a lot of static variables (like a huge array defined in subroutine), they may use up your stack.

To solve this, first run ulimit -s to check the stack limit for the process. You can use ulimit -s unlimited to set it as ulimited. Then if it still crashes, try to increase the stack for OPENMP by setting OMP_STACKSIZE environmental variable to a huge value, like 100MB.

Intel has a discussion at https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors. It has more information of stack and heap memory.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.