How to multithread the merge operation in merge sort?

Question

In the multithreaded versions of merge sort that I've seen, the multithreading is typically done during the recursion on the left and right subarray (i.e., each thread is assigned their own subarray to work on) and the merge operation is done by the master thread after each thread completes their individual work.

I am wondering if there's a nice way to multithread the final merge operation where you're merging 2 sorted subarrays? If so, how can this be done?

As with any task that you multithread for performance, you need to split the task and distribute it to separate threads. BTW: You're asking a yes/no question. — Ulrich Eckhardt
– Ulrich Eckhardt, Commented Aug 25, 2020 at 18:28
@UlrichEckhardt: The OP is also asking how can this be done. — chqrlie
– chqrlie, Commented Aug 25, 2020 at 19:38
How can this be done compared to what? In what language? Too broad. — user207421
– user207421, Commented Jan 25, 2024 at 6:07

chqrlie · Accepted Answer · 2020-08-25 20:38:15Z

1

Actually there is a way to split the merging task among 2 concurrent threads:

once both subarrays are sorted,
assign one thread the task to merge the elements from the beginning of the sorted subarrays to the first half of the target array and
assign the other thread a different but complementary task: merging from the end of the sorted subarrays to the second half of the target array, starting from the end.
you must write these merging functions carefully so the sort stays stable and each thread should will only write half of the target array, potentially reading the same elements from the sorted subarrays but selecting different ones.

I have not seen this approach mentioned in the literature about multithread merge sort. I wonder if it performs better than classic implementations.

edited Aug 25, 2020 at 20:38

answered Aug 25, 2020 at 19:10

chqrlie

152k12 gold badges145 silver badges231 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

24n8 Over a year ago

Interesting idea. So rather than assigning each thread to a sorted subarray, we instead assign 1 thread to the first half of the final array, and the other 2 thread to the latter half of the final array. Then thread 1 operates on both sorted subarrays and merges elements until the first half of the final array is filled, and thread 2 does the analogous work but for the second half of the final array. And since both threads are only reading elements of the sorted subarrays, we should be free from race conditions?

chqrlie Over a year ago

@ananuser01: Yes, this is a possible implementation, but be careful: the threads must synchronize when thy are done with the sorting because the merging cannot start until both halves are fully sorted. Furthermore, the second thread uses a different merging algorithm that operates from high index values down on both sorted halves and the destination array, stopping at the middle point.

rcgldr Over a year ago

@chqrlie - I took an alternative approach in this prior question using Windows native threads that I posted about multi-threaded merge sort. For k threads, the code splits up the array into k parts, each thread sorts its portion, then k/2 threads merge pairs of the k parts, then k/4 threads merge pairs of the 2k parts, ... . On an old 3770K cpu with 4 hyper-threading cores, 4 threads was about 3 times faster, 8 threads 3.9 times faster, apparently, reaching a point of diminishing returns. I suspect core local cache is a factor.

rcgldr Over a year ago

@chqrlie - continuing, I may try your idea of merging k parts with k threads merging from the front and back. This may be faster in the general case. The worst case scenario would be reverse sorted data, where the first element of the left sorted sub-array is greater than the last element of the right sorted sub-array, with both threads will end up reading every element of the second sub-array, the "forwards" thread copying the right sub-array to the left half of the merged array, the "backwards" thread copying the left sub-array to the right half of the merged array.

chqrlie Over a year ago

@rcgldr: I'm not sure this would be a worst case as both treads would keep copying from the same sorted subarray and branch prediction would be quite effective, potentially making it a best case scenario for merging, along with direct sorted data. Looking forward to your benchmarking results.

|

Nelson Page · Accepted Answer · 2024-01-25 05:26:26Z

A merge of sorted arrays can be split into n independent, parallelizable tasks. The general idea is you pick values from the input data and merge the subarray elements between those values in their own thread. The steps are:

Pick n - 1 values from the unsorted input data
Merge sort normally until the subarrays are big enough that it's worth merging in multiple threads.
For each subarray you're merging, for each value you picked, find the index of the last occurrence of that value if it exists, or the index of the first value higher than it if it does not exist.
Sort each list of indexes and put the beginning and end of the subarray in the list.
You can now merge the slices of each subarray from index i, inclusive, up to index i+1, exclusive, and write that result to disk independent of the rest of the merging.

Here's a concrete example:

unsorted input:  [a,w,d,y,u,p,l,s,c,e,g,h]
pick values: g and p
subarrays in last step of mergesort: a = [a,d,p,u,w,y] and b = [c,e,g,h,l,s] 
indexes of picked values (or next highest) in a: a_ind=[2, 2]
indexes of picked values (or next highest) in b: b_ind=[2, 5]
add ends of array to indexes: a_ind=[0,2,2,6], b_ind=[0,2,5,6]

merge each of these in separate threads:
[a,d] with [c,e]
[] with [g, h, l]
[p,u,w,y] with [s]

This example is very inefficient, but in my experience if the subarrays are large and you carefully choose n, it can provide good parallelization and faster sorts.

You can write the result from a single thread to the output file before the other results finish because you know it starts at a_ind[i] + b_ind[i] and how many values you're merging on that thread.

One challenge is finding a way to evenly divide the work. I picked values randomly and picked quite a few more than I had cores, which works well for distributing the work across threads evenly.

This is really just a generalization of @chqrlie's answer, but hope it might help someone.

Collectives™ on Stack Overflow

How to multithread the merge operation in merge sort?

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related