5,782 questions
-3
votes
0
answers
64
views
Intel ARC GPU hangs when performing an untyped surface read [closed]
I am currently writing a driver for the Intel ARC GPU series (specifically I use the A750 for testing purposes) for my own operating system.
I am already able to execute compute kernels that use ...
1
vote
0
answers
51
views
OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%
[Goal & Problem]
I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...
3
votes
2
answers
100
views
OpenCL Kernel slow and doesn't utilise CPU fully
I tried to do an old advent of code problem in OpenCL, but it's very slow.
const char *KernelSource_part_b = "\n" \
"typedef unsigned long uint64_t; ...
0
votes
0
answers
61
views
How to Use OpenCL in Exynos2400 in termux?
I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices
I tried various ...
0
votes
1
answer
58
views
What's the OpenCL idiom for elementwise array-lookup / gather operation with vectorized types?
Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup:
float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements = {
...
1
vote
1
answer
104
views
Local atomics causes GPU to crash
I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...
1
vote
0
answers
46
views
Unresolved extern function '__write_pipe_2' when building an OpenCL program
I'm using the OpenCL clBuildProgram() API function on a program created from a source string. The source is:
kernel void foo(int val, write_only pipe int outPipe)
{
write_pipe(outPipe, &val);
}...
0
votes
0
answers
17
views
Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?
Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer".
Does ...
0
votes
0
answers
26
views
When should I use clEnqueueSVMMemcpy?
OpenCL has the mechanism of "shared virtual memory" (SVM), where the same memory region is available both in OpenCL kernel code and in host-side code - and updates on one side affect the ...
0
votes
0
answers
16
views
How can I determine why clSVMAlloc failed?
Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...
0
votes
1
answer
33
views
How should I perform an elementwise cast of an OpenCL C vector value?
OpenCL C supports "vector data types" - a fixed number of scalar types which may be operated on together, as though they were a single scalar, mostly: we can apply arithmetic and logic ...
0
votes
1
answer
37
views
Why is clEnqueueWaitForEvents deprecated? It seems indispensible
I'm looking at the clEnqueueWaitForEvents() OpenCL API function.
As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...
0
votes
1
answer
46
views
What's the right way to determine which kind of cl_program I have?
The OpenCL API has one object which is sort of a "kitchen sink" for a lot of stuff: The program (with handle type cl_program). It can hold:
A textual program source ( ...
1
vote
1
answer
46
views
Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?
In the following program, I compile a kernel for the first device on the first platform:
const char* kernel_source_code = R"(
__kernel void vectorAdd(
__global float * __restrict C,
...
0
votes
1
answer
99
views
OpenCL createProgramWithSource doesn't work with a c-string declared in either global or function scope
I'm trying to run a basic kernel in OpenCL. See the snipped attached
const char kernel_source[] = "__kernel void matmul(__global float* A, __global float* B, __global float* C) { int row = ...
0
votes
0
answers
46
views
SIGV on clGetPlatformIDs
There is an SIGV wile calling clGetPlatformIDs, which is sometimes a fatal SIGSEGV.
Minimum exemple I found producing it:
#include <stdio.h>
#include <stdlib.h>
#include "CL/cl.h"...
1
vote
0
answers
59
views
How can I optimize the SPH part (OpenCL) of my N-Body-Simulation
I implemented a N-body simulation that combines gravity and SPH (smoothed particle hydrodynamics). I want to optimize the SPH part. I use spacial hashing for the neighborhood search. On the host side (...
0
votes
1
answer
49
views
What happens when you set the same OpenCL callback more than once on the same object?
OpenCL has several API functions to set callback functions - for events, for buffers/memory objects, for contexts and maybe more.
What happens if you invoke one of these functions, more than once, on ...
1
vote
1
answer
77
views
Do I need to set a CL_MEM_READ_WRITE when creating a buffer?
When creating a buffer in OpenCL, one passes a flags bitfield. One of these possible flags is CL_MEM_READ_WRITE, being the lowest bit in the field (value 1 << 0). Its documentation says that &...
0
votes
0
answers
67
views
Why is my OpenCL optimized convolution kernel slower than the naive version at higher workgroup sizes?
I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel:
A naive version that uses only global memory.
An ...
1
vote
0
answers
50
views
Are OpenCL kernel NDarrays necessarily limited to 3 dimensions?
I'm looking at the OpenCL specification for clGetKernelWorkGroupInfo(), and am noticing that the returned types for CL_KERNEL_GLOBAL_WORK_SIZE and CL_KERNEL_COMPILE_WORK_GROUP_SIZE are both size_t[3]. ...
0
votes
1
answer
44
views
Does pyopencl transfer arrays to host memory implicitly?
I have AMD GPU. I'm using pyopencl. I have a context and a queue. Then I created an array:
import pyopencl
import pyopencl.array
ctx = pyopencl.create_some_context(interactive=False)
queue = pyopencl....
0
votes
2
answers
127
views
How to invert the colors of an image?
I'm working on a project where I need to perform image inversion using GPU with OpenCL, but it's not working as expected. The goal is to invert the colors of an image using a kernel and then retrieve ...
0
votes
1
answer
53
views
OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?
I use the OpenCL.NET C# wrapper for OpenCL.
My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL:
Platform Version: OpenCL 2.1 AMD-APP (3570.0)
Device Name: gfx90c
Device Profile: ...
-1
votes
1
answer
72
views
odd-even sort in OpenCL
i have the next task: write opencl program, which will sort massive using odd-even sort method. i wrote it, but have some troubles and i don't know how to solve it because i'm new in opencl
here is my ...
2
votes
1
answer
74
views
OpenCL segfault on clBuildProgram
I'm building my first OpenCL program in C in order to quickly compute the Mandelbrot Set, and I'm getting a segfault on clBuildProgram. Below is the relevant code:
The kernel code (though the program ...
0
votes
1
answer
44
views
What are the OpenCL devices "associated with a program"?
The clCompileProgram() function of OpenCL takes, among other parameters, a cl_program handle named program, and a list of device handles: cl_device* device_list. The documentation for this function ...
0
votes
1
answer
36
views
Why does clLinkProgram take a context handle?
In OpenCL, the clLinkProgram() function takes (among other things)
A cl_context context handle;
An array of cl_program handles of program objects.
Now, a cl_program is always created in a context; ...
0
votes
1
answer
27
views
reduction/conjuction/disjunction functions for OpenCL vector types?
OpenCL offers built-in/intrinsic "vector types" (see table 3 at the link), such as int4 or float2. It also defines binary and unary elementwise operators which accept these types, e.g. ...
1
vote
0
answers
27
views
Is MPI necessary for invoking OpenCL devices across multiple compute nodes?
What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is ...
1
vote
0
answers
71
views
clBuildProgram vs clCompileProgram - when should I call each of these?
In regular software development parlance, we begin with program sources; we then compile them into binary objects; and finally link the objects them into an executable object. And the entire process ...
0
votes
0
answers
42
views
What is the guaranteed relation of the per-binary and overall return status of clCreateProgramWithBinaries?
The OpenCL API has the following function:
cl_program clCreateProgramWithBinary(
cl_context context,
cl_uint num_devices,
const cl_device_id* device_list,
const size_t* lengths,
...
0
votes
1
answer
39
views
Can OpenCL platforms change over the course of execution?
While executing a process, can the results of subsequent calls to clGetPlatformIds() change? i.e. can platforms disappear, appear, change order, or change handles ("ids")?
I'm asking about ...
0
votes
4
answers
1k
views
How do I use OpenCL in a docker container
I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container
First attempt
FROM ubuntu:latest
RUN apt-get update
#Done to make install non-...
0
votes
0
answers
46
views
What's the deal with the cl_mem_flags used to create an OpenCL pipe?
The API call for creating a pipe in OpenCL 2.0 and later is:
cl_mem clCreatePipe(
cl_context context,
cl_mem_flags flags,
cl_uint pipe_packet_size,
cl_uint pipe_max_packets,
const ...
0
votes
0
answers
11
views
Why does clCompileProgram take "headers" wrapped in cl_program's?
In OpenCL, when you want to compile (not link) a kernel for some target devices, you call:
cl_int clCompileProgram(
cl_program program,
cl_uint num_devices,
const cl_device_id* device_list,...
0
votes
0
answers
17
views
Why does OpenCL not have a clGetKernelArg function?
In OpenCL, before launching a kernel, we set its arguments using the clSetKernelArg() API function; and the cl_kernel handle must be backed by some kind of storage for the value of those arguments we ...
2
votes
1
answer
64
views
What happens to set kernel arguments after launch? Must I reset them?
In CUDA, launching a kernel means specifying its arguments, marshaled via an array of pointers:
CUresult cuLaunchKernel (
CUfunction f,
/* launch config stuff */,
void** kernelParams,
...
1
vote
0
answers
22
views
Why do OpenCL enqueue API functions take event lists, when we can enqueue a barrier?
OpenCL command queues' enqueue API functions typically take a sequence of dependency events. For example:
cl_int clEnqueueCopyBuffer(
cl_command_queue command_queue,
cl_mem src_buffer,
...
0
votes
0
answers
20
views
In OpenCL, do contexts keep subdevices alive?
In OpenCL (let's say v3.0), I know one can create contexts using sub-devices. But - what happens if you release all references to a sub-device while the context is not released (i.e. has positive ...
0
votes
1
answer
150
views
Cannot detect OpenCL 3.0 for NVIDIA GPU
Faced with strange issue with cmake detection of OpenCL.
When I use the following CMakeLists.txt:
cmake_minimum_required(VERSION 3.10)
# Uncomment to make it working
# include(CheckSymbolExists)
# ...
1
vote
0
answers
14
views
In OpenCL, can we obtain the default queue for a device (in a context)?
The OpenCL API defines such a thing as the "default queue", for a given context and device in that context. Indeed, when we clCreateCommandQueueWithProperties, one of the properties we ...
1
vote
1
answer
78
views
In OpenCL, what's the difference between "host" and "device" command-queues?
In OpenCL, when creating a command queue, we can set the options to indicate that this will be a "device" command queue; otherwise, it's a "host" queue. The C++ bindings have ...
0
votes
0
answers
71
views
Can I write OpenCL kernels using some kind of C++, to run on NVIDIA GPUs, in 2024?
OpenCL has had a bumpy ride over the years w.r.t. to the prospects of using C++ to write kernels. First there was "OpenCL C++ kernel language", standardized with OpenCL v2.1 - but that did ...
0
votes
0
answers
44
views
How can a GPU directly access memory data created by the CPU while maintaining real-time synchronization?
I hope to use the GPU on a mobile phone to directly and quickly access and synchronize memory data, which the CPU will also read and modify in real-time.
On a mobile phone, this is a simple example, ...
1
vote
0
answers
38
views
OpenCL: Kernel only reading first pixel
Grayscale kernel only reading first pixel
The following is my grayscale.cl kernel implementation. The problem that I am facing is that the kernel seems to perform the grayscale calculation only on the ...
1
vote
1
answer
95
views
Non-blocking copy from Host to Buffer with enqueue_copy
I am writing a code with PyOpenCL to offload heavy computations to GPU. To optimize the algorithm, I would like to parallel some of the memory transfer operations with further calculations. However, I ...
0
votes
1
answer
90
views
Effect of distance between CUDA threads in block?
I have a naive question about GPU programming. (ChatGPT and Claude didn't really give me a convincing answer. Maybe I'm prompting badly.)
GPU programming languages like CUDA and OpenCL organise ...
0
votes
2
answers
274
views
How to enable OpenCL syntax highlighting and syntax checking of OpenCL C files in Visual Studio 2019?
How to enable syntax highlighting and syntax checking of CL kernels written in OpenCL C (in *.cl files) in Visual Studio 2019 IDE?
See the example below:
The *.cl files use the OpenCL C syntax that ...
1
vote
1
answer
130
views
Ubuntu OpenCL can't find Intel GPU on double GPU device
I'm trying to code an opencl C++ application on an old Ubuntu laptop. It has two GPU's which are shown when I run lspci | grep VGA:
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core ...