-1

i have the next task: write opencl program, which will sort massive using odd-even sort method. i wrote it, but have some troubles and i don't know how to solve it because i'm new in opencl

here is my kernel code:

const char* kernelSource =
"__kernel void odd_even_sort(__global ulong * arr, const unsigned int n) {\n"
"    unsigned int id = get_global_id(0);\n"
"    for (unsigned int phase = 0; phase < n; phase++) {\n"
"        unsigned int swap_idx;\n"
"        if (phase % 2 == 0) {\n"
"            swap_idx = id * 2;\n"
"        }\n"
"        else {\n"
"            swap_idx = id * 2 + 1;\n"
"        }\n"
"        if (swap_idx + 1 < n) {\n"
"            if (arr[swap_idx] > arr[swap_idx + 1]) {\n"
"                ulong temp = arr[swap_idx];\n"
"                arr[swap_idx] = arr[swap_idx + 1];\n"
"                arr[swap_idx + 1] = temp;\n"
"            }\n"
"        }\n"
"        barrier(CLK_GLOBAL_MEM_FENCE);\n"
"    }\n"
"}\n";

and here is how i run it:

clSetKernelArg(kernel, 0, sizeof(cl_mem), (cl_ulong*)&A_mem);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &nums);
size_t global_work_size = nums / 2;
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);

nums is the length of massive, it is a power of 2

i don't know why, but when i try to print sorted massive, i can't find some numbers from the unsorted. instead of these numbers there can be dublicates of random numbers, here is the example:

  1. 3669687509
  2. 3670134728
  3. 3672631890
  4. 3674672863
  5. 3674672863
  6. 3674672863
  7. 3674672863
  8. 3674672863
  9. 3678491299
  10. 3678977840
  11. 3682679700

the unsorted massive is guaranteed to contain no duplicate numbers.

there may also be a problem that the number is out of place (example with another array):

  1. 4011496893
  2. 4013708165
  3. 4016925868
  4. 4116700792
  5. 4016953184
  6. 4020082302

i hope for someone's help🙏

4
  • Try posting a minimal reproducible example of your OpenCL program, including the code which handles the array. You know you have to read the buffer to access it on the host after the kernel has completed right? Commented Mar 7 at 22:10
  • @SimonGoater sure I know, the rest of the code is correct, the problem lies either in the kernel or in the selection of the global_work_size and local_work_size parameters Commented Mar 9 at 14:56
  • Sometimes drivers don't work correctly as well, so if you're using old hardware, check other OpenCL programs work correctly. I take it this is just an exercise? This algorithm is O(n^2) in the general case. Commented Mar 9 at 15:13
  • @SimonGoater it is exercise for my university. my drivers are updated. my teacher tried to run this code on his NVIDIA (i work with Intel), and it doesn't work too. i can't understand what is the reason of appearing dublicates of numbers. can it be problem with barriers? how can i fix it? Commented Mar 9 at 15:19

1 Answer 1

0

I tried the provided kernel code with this python script:

#!/usr/bin/env python

import numpy as np
import random
from numpy.random import default_rng
import pyopencl as cl

rng = default_rng()

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags

prg = cl.Program(ctx, """
__kernel void odd_even_sort(__global ulong * arr, const unsigned int n) {
    unsigned int id = get_global_id(0);
    for (unsigned int phase = 0; phase < n; phase++) {
        unsigned int swap_idx;
        if (phase % 2 == 0) {
            swap_idx = id * 2;
        }
        else {
            swap_idx = id * 2 + 1;
        }
        if (swap_idx + 1 < n) {
            if (arr[swap_idx] > arr[swap_idx + 1]) {
                ulong temp = arr[swap_idx];
                arr[swap_idx] = arr[swap_idx + 1];
                arr[swap_idx + 1] = temp;
            }
        }
        barrier(CLK_GLOBAL_MEM_FENCE);
    }
};

""").build()

nums = 10000
np_arr = rng.choice(1844674407370955161, size=nums, replace=False)
cl_arr = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=np_arr)
knl = prg.odd_even_sort
knl(queue, (int(nums/2),), None, cl_arr, cl.cltypes.uint(nums))
res_arr = np.empty_like(np_arr)
cl.enqueue_copy(queue, res_arr, cl_arr)
print(res_arr)
# for i in range(nums):
#     for j in range(nums):
#         if res_arr[i]==res_arr[j] and i!=j:
#             print("res: ",res_arr[i])
cl_arr = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=np_arr)
knl.set_arg(0, cl_arr)
knl(queue, (int(nums/2),), None, None, cl.cltypes.uint(nums))
res_arr = np.empty_like(np_arr)
cl.enqueue_copy(queue, res_arr, cl_arr)
print(res_arr)
# for i in range(nums):
#     for j in range(nums):
#         if res_arr[i]==res_arr[j] and i!=j:
#             print("res: ",res_arr[i])

The first kernel call works fine, with no duplicate values. The second call (with the only change being that the cl_arr is set as an arg before execution) fails with a seg fault caused by this line: if (arr[swap_idx] > arr[swap_idx + 1]). Using gdb to print swap_idx returns $1 = <optimized out>. This means that the compiler sees swap_idx as a duplicate of some value and compiles them as the same variable. This seems to be an odd compiler issue. See this answer. To fix it, try passing the args to the kernel directly, instead of setting them beforehand. Another possible fix could be to remove all optimization compiler arguments you might be using.

A note: I am using POCL as my OpenCL platform, which may introduce it's own set of issues.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.