4

I am trying to use openmp for a for loop in which i am trying to insert/update a hash map (std::unordered_map)

Hash map and keys are actually members of a class so I assigned pointers to represent their addresses.Key is also a hash value returned by a global function.

The following seems the easiest way of doing this but the hash map is not updated correctly. Something(s) is/are wrong but i am not sure how to fix.Thanks in advance.

void MyClass::ProcessBuffer(void)
{
    omp_set_num_threads(4);
    std::unordered_map<unsigned long long,unsigned int>* hashptr=&m_sequencehash;
    std::vector<std::string>* bufferptr=&m_buffer;
    unsigned int sizevec=m_kmer_size;
    size_t i;
    #pragma omp parallel for
    for (i=0; i<READSTR_BUF_SIZE;++i)
    {
        ++(*hashptr)[_hash((*bufferptr)[i],sizevec)];
    }

}
5
  • 3
    std::unordered_map is not thread safe, and so your code has a race condition, resulting in undefined behavior. Commented May 1, 2017 at 20:51
  • Maybe Adding data to a hashmap in parallel Commented May 1, 2017 at 21:00
  • Just curious about the performance gain you will get from paralleling an operation that will eventually require locking to be thread safe?! Commented May 1, 2017 at 21:06
  • You might consider the use of a TBB concurrent hashmap threadingbuildingblocks.org/docs/help/tbb_userguide/… you don't need to use the rest of TBB , though that might also make sense! Commented May 3, 2017 at 8:41
  • A concurrent map is an overkill for this purpose since it only needs to access concurrently during the initialization step. However, in memory map-reduce library will fit perfectly for this scenario. Commented May 4, 2017 at 16:35

1 Answer 1

4

The easiest way to solve this is creating a new map for each thread, then reducing them to a single map sequentially. This is a classic map-reduce scenario.

int s = omp_get_num_threads();
std::unordered_map<unsigned long long,unsigned int> res[s];

// Map step
#pragma omp parallel for
for (i=0; i<READSTR_BUF_SIZE;++i)
{
    int t = omp_get_thread_num();
    res[t][_hash((*bufferptr)[i],sizevec)]++;
}

// Reduce step
for (int i=0; i < s; i++) {
    for (auto r : res[s]) {
        (*hashptr)[r.first] += r.second;
    }
}

Performing the reduction concurrently might be dangerous because you will still have to access the same map concurrently. If you don't know the implementation of the map, you can't know for sure that this is safe.

Alternatively, you can partition the hash values between different maps by putting different hash intervals in different buckets. This will prevent the different threads from accessing the same hash value in the reduce step. However, on a small dataset, it is hard to find a good partition function with a small number of buckets. Using too many buckets might have significant overhead compared to the serial approch.

Sign up to request clarification or add additional context in comments.

6 Comments

The answer is on the right track, but the definition of res / access to r in the parallel for is wrong. You should also use a proper reduction instead of the manual one. See also this answer.
Thanks, I fixed the access to res.
Using proper reduction is dangerous because you will still have to access the same map concurrently. I don't know the implementation of the map so I can't know for sure that this is safe.
You still have only one shared map in your code. While I can't quite find something in the standard, I'm fairly it is guaranteed that reduction combiners that work on the same data are executed mutually exclusively and with proper memory isolation. Otherwise all implicitly and user declared combiners I have ever seen would be wrong.
No you it does not need to allocate a new map - one of the inputs is also the output for the combiner. Also don't assume log(nthreads) operations (related answer)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.