Newest 'opencl' Questions

-3 votes

0 answers

64 views

Intel ARC GPU hangs when performing an untyped surface read [closed]

I am currently writing a driver for the Intel ARC GPU series (specifically I use the A750 for testing purposes) for my own operating system. I am already able to execute compute kernels that use ...

Joel Marker

40

asked Nov 26 at 0:54

1 vote

0 answers

51 views

OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%

[Goal & Problem] I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...

이호연

11

asked Nov 18 at 9:10

3 votes

2 answers

100 views

OpenCL Kernel slow and doesn't utilise CPU fully

I tried to do an old advent of code problem in OpenCL, but it's very slow. const char *KernelSource_part_b = "\n" \ "typedef unsigned long uint64_t; ...

Richard Clubb

63

asked Nov 1 at 9:40

0 votes

0 answers

61 views

How to Use OpenCL in Exynos2400 in termux?

I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices I tried various ...

Lakshit Karsoliya

25

asked Oct 25 at 3:24

0 votes

1 answer

58 views

What's the OpenCL idiom for elementwise array-lookup / gather operation with vectorized types?

Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup: float* tbl = get_data(); int4 offsets = get_offsets(); float4 my_elements = { ...

einpoklum

137k

asked Sep 16 at 12:22

1 vote

1 answer

104 views

Local atomics causes GPU to crash

I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...

Edward Murphy

69

asked Sep 16 at 2:05

1 vote

0 answers

46 views

Unresolved extern function '__write_pipe_2' when building an OpenCL program

I'm using the OpenCL clBuildProgram() API function on a program created from a source string. The source is: kernel void foo(int val, write_only pipe int outPipe) { write_pipe(outPipe, &val); }...

einpoklum

137k

asked Jul 27 at 19:20

0 votes

0 answers

17 views

Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?

Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer". Does ...

einpoklum

137k

asked Jul 14 at 10:35

0 votes

0 answers

26 views

When should I use clEnqueueSVMMemcpy?

OpenCL has the mechanism of "shared virtual memory" (SVM), where the same memory region is available both in OpenCL kernel code and in host-side code - and updates on one side affect the ...

einpoklum

137k

asked Jul 11 at 13:31

0 votes

0 answers

16 views

How can I determine why clSVMAlloc failed?

Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...

einpoklum

137k

asked Jun 30 at 19:02

0 votes

1 answer

33 views

How should I perform an elementwise cast of an OpenCL C vector value?

OpenCL C supports "vector data types" - a fixed number of scalar types which may be operated on together, as though they were a single scalar, mostly: we can apply arithmetic and logic ...

einpoklum

137k

asked Jun 27 at 19:20

0 votes

1 answer

37 views

Why is clEnqueueWaitForEvents deprecated? It seems indispensible

I'm looking at the clEnqueueWaitForEvents() OpenCL API function. As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...

einpoklum

137k

asked Jun 9 at 22:52

0 votes

1 answer

46 views

What's the right way to determine which kind of cl_program I have?

The OpenCL API has one object which is sort of a "kitchen sink" for a lot of stuff: The program (with handle type cl_program). It can hold: A textual program source ( ...

einpoklum

137k

asked Jun 3 at 21:03

1 vote

1 answer

46 views

Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?

In the following program, I compile a kernel for the first device on the first platform: const char* kernel_source_code = R"( __kernel void vectorAdd( __global float * __restrict C, ...

einpoklum

137k

asked Jun 2 at 17:15

0 votes

1 answer

99 views

OpenCL createProgramWithSource doesn't work with a c-string declared in either global or function scope

I'm trying to run a basic kernel in OpenCL. See the snipped attached const char kernel_source[] = "__kernel void matmul(__global float* A, __global float* B, __global float* C) { int row = ...

T3chstop

61

asked Jun 1 at 19:34

0 votes

0 answers

46 views

SIGV on clGetPlatformIDs

There is an SIGV wile calling clGetPlatformIDs, which is sometimes a fatal SIGSEGV. Minimum exemple I found producing it: #include <stdio.h> #include <stdlib.h> #include "CL/cl.h"...

LentilesGR

1

asked May 17 at 19:00

1 vote

0 answers

59 views

How can I optimize the SPH part (OpenCL) of my N-Body-Simulation

I implemented a N-body simulation that combines gravity and SPH (smoothed particle hydrodynamics). I want to optimize the SPH part. I use spacial hashing for the neighborhood search. On the host side (...

Paul Aner

543

asked May 14 at 16:21

0 votes

1 answer

49 views

What happens when you set the same OpenCL callback more than once on the same object?

OpenCL has several API functions to set callback functions - for events, for buffers/memory objects, for contexts and maybe more. What happens if you invoke one of these functions, more than once, on ...

einpoklum

137k

asked May 5 at 11:33

1 vote

1 answer

77 views

Do I need to set a CL_MEM_READ_WRITE when creating a buffer?

When creating a buffer in OpenCL, one passes a flags bitfield. One of these possible flags is CL_MEM_READ_WRITE, being the lowest bit in the field (value 1 << 0). Its documentation says that &...

einpoklum

137k

asked May 2 at 5:01

0 votes

0 answers

67 views

Why is my OpenCL optimized convolution kernel slower than the naive version at higher workgroup sizes?

I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel: A naive version that uses only global memory. An ...

Mxneeb

19

asked May 1 at 23:07

1 vote

0 answers

50 views

Are OpenCL kernel NDarrays necessarily limited to 3 dimensions?

I'm looking at the OpenCL specification for clGetKernelWorkGroupInfo(), and am noticing that the returned types for CL_KERNEL_GLOBAL_WORK_SIZE and CL_KERNEL_COMPILE_WORK_GROUP_SIZE are both size_t[3]. ...

einpoklum

137k

asked May 1 at 15:28

0 votes

1 answer

44 views

Does pyopencl transfer arrays to host memory implicitly?

I have AMD GPU. I'm using pyopencl. I have a context and a queue. Then I created an array: import pyopencl import pyopencl.array ctx = pyopencl.create_some_context(interactive=False) queue = pyopencl....

haael

1,069

asked Apr 9 at 18:21

0 votes

2 answers

127 views

How to invert the colors of an image?

I'm working on a project where I need to perform image inversion using GPU with OpenCL, but it's not working as expected. The goal is to invert the colors of an image using a kernel and then retrieve ...

SzPeter-9923

11

asked Mar 31 at 11:30

0 votes

1 answer

53 views

OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?

I use the OpenCL.NET C# wrapper for OpenCL. My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL: Platform Version: OpenCL 2.1 AMD-APP (3570.0) Device Name: gfx90c Device Profile: ...

Chameleon

2,239

asked Mar 10 at 19:37

-1 votes

1 answer

72 views

odd-even sort in OpenCL

i have the next task: write opencl program, which will sort massive using odd-even sort method. i wrote it, but have some troubles and i don't know how to solve it because i'm new in opencl here is my ...

kylo_gg

1

asked Mar 6 at 20:24

2 votes

1 answer

74 views

OpenCL segfault on clBuildProgram

I'm building my first OpenCL program in C in order to quickly compute the Mandelbrot Set, and I'm getting a segfault on clBuildProgram. Below is the relevant code: The kernel code (though the program ...

Lemma

143

asked Mar 5 at 20:19

0 votes

1 answer

44 views

What are the OpenCL devices "associated with a program"?

The clCompileProgram() function of OpenCL takes, among other parameters, a cl_program handle named program, and a list of device handles: cl_device* device_list. The documentation for this function ...

einpoklum

137k

asked Feb 28 at 23:07

0 votes

1 answer

36 views

Why does clLinkProgram take a context handle?

In OpenCL, the clLinkProgram() function takes (among other things) A cl_context context handle; An array of cl_program handles of program objects. Now, a cl_program is always created in a context; ...

einpoklum

137k

asked Feb 24 at 22:24

0 votes

1 answer

27 views

reduction/conjuction/disjunction functions for OpenCL vector types?

OpenCL offers built-in/intrinsic "vector types" (see table 3 at the link), such as int4 or float2. It also defines binary and unary elementwise operators which accept these types, e.g. ...

einpoklum

137k

asked Feb 23 at 17:12

1 vote

0 answers

27 views

Is MPI necessary for invoking OpenCL devices across multiple compute nodes?

What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is ...

Redshoe

301

asked Feb 15 at 19:01

1 vote

0 answers

71 views

clBuildProgram vs clCompileProgram - when should I call each of these?

In regular software development parlance, we begin with program sources; we then compile them into binary objects; and finally link the objects them into an executable object. And the entire process ...

einpoklum

137k

asked Feb 15 at 17:08

0 votes

0 answers

42 views

What is the guaranteed relation of the per-binary and overall return status of clCreateProgramWithBinaries?

The OpenCL API has the following function: cl_program clCreateProgramWithBinary( cl_context context, cl_uint num_devices, const cl_device_id* device_list, const size_t* lengths, ...

einpoklum

137k

asked Feb 15 at 10:47

0 votes

1 answer

39 views

Can OpenCL platforms change over the course of execution?

While executing a process, can the results of subsequent calls to clGetPlatformIds() change? i.e. can platforms disappear, appear, change order, or change handles ("ids")? I'm asking about ...

einpoklum

137k

asked Feb 10 at 12:37

0 votes

4 answers

1k views

How do I use OpenCL in a docker container

I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container First attempt FROM ubuntu:latest RUN apt-get update #Done to make install non-...

sav

2,170

asked Feb 6 at 6:58

0 votes

0 answers

46 views

What's the deal with the cl_mem_flags used to create an OpenCL pipe?

The API call for creating a pipe in OpenCL 2.0 and later is: cl_mem clCreatePipe( cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const ...

einpoklum

137k

asked Jan 22 at 11:53

0 votes

0 answers

11 views

Why does clCompileProgram take "headers" wrapped in cl_program's?

In OpenCL, when you want to compile (not link) a kernel for some target devices, you call: cl_int clCompileProgram( cl_program program, cl_uint num_devices, const cl_device_id* device_list,...

einpoklum

137k

asked Jan 20 at 15:25

0 votes

0 answers

17 views

Why does OpenCL not have a clGetKernelArg function?

In OpenCL, before launching a kernel, we set its arguments using the clSetKernelArg() API function; and the cl_kernel handle must be backed by some kind of storage for the value of those arguments we ...

einpoklum

137k

asked Jan 18 at 21:08

2 votes

1 answer

64 views

What happens to set kernel arguments after launch? Must I reset them?

In CUDA, launching a kernel means specifying its arguments, marshaled via an array of pointers: CUresult cuLaunchKernel ( CUfunction f, /* launch config stuff */, void** kernelParams, ...

einpoklum

137k

asked Jan 18 at 14:39

1 vote

0 answers

22 views

Why do OpenCL enqueue API functions take event lists, when we can enqueue a barrier?

OpenCL command queues' enqueue API functions typically take a sequence of dependency events. For example: cl_int clEnqueueCopyBuffer( cl_command_queue command_queue, cl_mem src_buffer, ...

einpoklum

137k

asked Jan 17 at 16:05

0 votes

0 answers

20 views

In OpenCL, do contexts keep subdevices alive?

In OpenCL (let's say v3.0), I know one can create contexts using sub-devices. But - what happens if you release all references to a sub-device while the context is not released (i.e. has positive ...

einpoklum

137k

asked Jan 16 at 16:02

0 votes

1 answer

150 views

Cannot detect OpenCL 3.0 for NVIDIA GPU

Faced with strange issue with cmake detection of OpenCL. When I use the following CMakeLists.txt: cmake_minimum_required(VERSION 3.10) # Uncomment to make it working # include(CheckSymbolExists) # ...

Denis Kotov

892

asked Jan 10 at 18:10

1 vote

0 answers

14 views

In OpenCL, can we obtain the default queue for a device (in a context)?

The OpenCL API defines such a thing as the "default queue", for a given context and device in that context. Indeed, when we clCreateCommandQueueWithProperties, one of the properties we ...

einpoklum

137k

asked Jan 7 at 15:32

1 vote

1 answer

78 views

In OpenCL, what's the difference between "host" and "device" command-queues?

In OpenCL, when creating a command queue, we can set the options to indicate that this will be a "device" command queue; otherwise, it's a "host" queue. The C++ bindings have ...

einpoklum

137k

asked Dec 27, 2024 at 13:25

0 votes

0 answers

71 views

Can I write OpenCL kernels using some kind of C++, to run on NVIDIA GPUs, in 2024?

OpenCL has had a bumpy ride over the years w.r.t. to the prospects of using C++ to write kernels. First there was "OpenCL C++ kernel language", standardized with OpenCL v2.1 - but that did ...

einpoklum

137k

asked Dec 26, 2024 at 11:57

0 votes

0 answers

44 views

How can a GPU directly access memory data created by the CPU while maintaining real-time synchronization?

I hope to use the GPU on a mobile phone to directly and quickly access and synchronize memory data, which the CPU will also read and modify in real-time. On a mobile phone, this is a simple example, ...

user28921138

1

asked Dec 24, 2024 at 20:11

1 vote

0 answers

38 views

OpenCL: Kernel only reading first pixel

Grayscale kernel only reading first pixel The following is my grayscale.cl kernel implementation. The problem that I am facing is that the kernel seems to perform the grayscale calculation only on the ...

Arief Kurniawan

53

asked Dec 15, 2024 at 21:56

1 vote

1 answer

95 views

Non-blocking copy from Host to Buffer with enqueue_copy

I am writing a code with PyOpenCL to offload heavy computations to GPU. To optimize the algorithm, I would like to parallel some of the memory transfer operations with further calculations. However, I ...

MrCheatak

179

asked Dec 11, 2024 at 8:22

0 votes

1 answer

90 views

Effect of distance between CUDA threads in block?

I have a naive question about GPU programming. (ChatGPT and Claude didn't really give me a convincing answer. Maybe I'm prompting badly.) GPU programming languages like CUDA and OpenCL organise ...

Martin Berger

1,128

asked Dec 6, 2024 at 18:30

0 votes

2 answers

274 views

How to enable OpenCL syntax highlighting and syntax checking of OpenCL C files in Visual Studio 2019?

How to enable syntax highlighting and syntax checking of CL kernels written in OpenCL C (in *.cl files) in Visual Studio 2019 IDE? See the example below: The *.cl files use the OpenCL C syntax that ...

George Robinson

2,420

asked Nov 22, 2024 at 3:39

1 vote

1 answer

130 views

Ubuntu OpenCL can't find Intel GPU on double GPU device

I'm trying to code an opencl C++ application on an old Ubuntu laptop. It has two GPU's which are shown when I run lspci | grep VGA: 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core ...

Turgut

859

asked Oct 31, 2024 at 10:11

Collectives™ on Stack Overflow