odd-even sort in OpenCL

Question

i have the next task: write opencl program, which will sort massive using odd-even sort method. i wrote it, but have some troubles and i don't know how to solve it because i'm new in opencl

here is my kernel code:

const char* kernelSource =
"__kernel void odd_even_sort(__global ulong * arr, const unsigned int n) {\n"
"    unsigned int id = get_global_id(0);\n"
"    for (unsigned int phase = 0; phase < n; phase++) {\n"
"        unsigned int swap_idx;\n"
"        if (phase % 2 == 0) {\n"
"            swap_idx = id * 2;\n"
"        }\n"
"        else {\n"
"            swap_idx = id * 2 + 1;\n"
"        }\n"
"        if (swap_idx + 1 < n) {\n"
"            if (arr[swap_idx] > arr[swap_idx + 1]) {\n"
"                ulong temp = arr[swap_idx];\n"
"                arr[swap_idx] = arr[swap_idx + 1];\n"
"                arr[swap_idx + 1] = temp;\n"
"            }\n"
"        }\n"
"        barrier(CLK_GLOBAL_MEM_FENCE);\n"
"    }\n"
"}\n";

and here is how i run it:

clSetKernelArg(kernel, 0, sizeof(cl_mem), (cl_ulong*)&A_mem);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &nums);
size_t global_work_size = nums / 2;
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);

nums is the length of massive, it is a power of 2

i don't know why, but when i try to print sorted massive, i can't find some numbers from the unsorted. instead of these numbers there can be dublicates of random numbers, here is the example:

3669687509
3670134728
3672631890
3674672863
3674672863
3674672863
3674672863
3674672863
3678491299
3678977840
3682679700

the unsorted massive is guaranteed to contain no duplicate numbers.

there may also be a problem that the number is out of place (example with another array):

4011496893
4013708165
4016925868
4116700792
4016953184
4020082302

i hope for someone's help🙏

Try posting a minimal reproducible example of your OpenCL program, including the code which handles the array. You know you have to read the buffer to access it on the host after the kernel has completed right? — Simon Goater
– Simon Goater, Commented Mar 7 at 22:10
@SimonGoater sure I know, the rest of the code is correct, the problem lies either in the kernel or in the selection of the global_work_size and local_work_size parameters — kylo_gg
– kylo_gg, Commented Mar 9 at 14:56
Sometimes drivers don't work correctly as well, so if you're using old hardware, check other OpenCL programs work correctly. I take it this is just an exercise? This algorithm is O(n^2) in the general case. — Simon Goater
– Simon Goater, Commented Mar 9 at 15:13
@SimonGoater it is exercise for my university. my drivers are updated. my teacher tried to run this code on his NVIDIA (i work with Intel), and it doesn't work too. i can't understand what is the reason of appearing dublicates of numbers. can it be problem with barriers? how can i fix it? — kylo_gg
– kylo_gg, Commented Mar 9 at 15:19

Edward Murphy · Accepted Answer · 2025-03-28 15:57:02Z

I tried the provided kernel code with this python script:

#!/usr/bin/env python

import numpy as np
import random
from numpy.random import default_rng
import pyopencl as cl

rng = default_rng()

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags

prg = cl.Program(ctx, """
__kernel void odd_even_sort(__global ulong * arr, const unsigned int n) {
    unsigned int id = get_global_id(0);
    for (unsigned int phase = 0; phase < n; phase++) {
        unsigned int swap_idx;
        if (phase % 2 == 0) {
            swap_idx = id * 2;
        }
        else {
            swap_idx = id * 2 + 1;
        }
        if (swap_idx + 1 < n) {
            if (arr[swap_idx] > arr[swap_idx + 1]) {
                ulong temp = arr[swap_idx];
                arr[swap_idx] = arr[swap_idx + 1];
                arr[swap_idx + 1] = temp;
            }
        }
        barrier(CLK_GLOBAL_MEM_FENCE);
    }
};

""").build()

nums = 10000
np_arr = rng.choice(1844674407370955161, size=nums, replace=False)
cl_arr = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=np_arr)
knl = prg.odd_even_sort
knl(queue, (int(nums/2),), None, cl_arr, cl.cltypes.uint(nums))
res_arr = np.empty_like(np_arr)
cl.enqueue_copy(queue, res_arr, cl_arr)
print(res_arr)
# for i in range(nums):
#     for j in range(nums):
#         if res_arr[i]==res_arr[j] and i!=j:
#             print("res: ",res_arr[i])
cl_arr = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=np_arr)
knl.set_arg(0, cl_arr)
knl(queue, (int(nums/2),), None, None, cl.cltypes.uint(nums))
res_arr = np.empty_like(np_arr)
cl.enqueue_copy(queue, res_arr, cl_arr)
print(res_arr)
# for i in range(nums):
#     for j in range(nums):
#         if res_arr[i]==res_arr[j] and i!=j:
#             print("res: ",res_arr[i])

The first kernel call works fine, with no duplicate values. The second call (with the only change being that the cl_arr is set as an arg before execution) fails with a seg fault caused by this line: if (arr[swap_idx] > arr[swap_idx + 1]). Using gdb to print swap_idx returns $1 = <optimized out>. This means that the compiler sees swap_idx as a duplicate of some value and compiles them as the same variable. This seems to be an odd compiler issue. See this answer. To fix it, try passing the args to the kernel directly, instead of setting them beforehand. Another possible fix could be to remove all optimization compiler arguments you might be using.

A note: I am using POCL as my OpenCL platform, which may introduce it's own set of issues.

Collectives™ on Stack Overflow

odd-even sort in OpenCL

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related