I'm trying to implement the Merge Sort algorithm on CUDA which is mentioned on the Designing Efficient Sorting Algorithms for Manycore GPUs.
For this as in the intermediate level, It has suggested to assign a thread block to merge two sorted arrays (say A and B) to a single array. And the idea is to assign a thread to data - a_i on the array A and find its location on B array using Binary Search. and if the position of a_i on B is j the a_i's position on new array is (i+j)
But when I implemented this (or tried to), I found a scenario that above method doesn't seems to work. For example if the two arrays that need to be merged are as follows,
and
So that the 53 on the first array (shaded in gray one) will find that its respective position on the second array is 11, So its position on the final array is (11 + 11 = 22). And likewise position of 53 on the first array which is shaded in blue is (11 + 12 = 23).
If we take the 53 of the second array its final position is also given as (11 + 11 = 22). Which conflicts with the grey color 53 on the first array.
So according to my understanding a simple binary search algorithm cannot be used to merge these two arrays. Is there a known or easier method to resolve this conflict?

