Skip to main content
Filter by
Sorted by
Tagged with
-3 votes
0 answers
64 views

I am currently writing a driver for the Intel ARC GPU series (specifically I use the A750 for testing purposes) for my own operating system. I am already able to execute compute kernels that use ...
Joel Marker's user avatar
1 vote
0 answers
51 views

[Goal & Problem] I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...
이호연's user avatar
3 votes
2 answers
100 views

I tried to do an old advent of code problem in OpenCL, but it's very slow. const char *KernelSource_part_b = "\n" \ "typedef unsigned long uint64_t; ...
Richard Clubb's user avatar
0 votes
0 answers
61 views

I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices I tried various ...
Lakshit Karsoliya's user avatar
0 votes
1 answer
58 views

Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup: float* tbl = get_data(); int4 offsets = get_offsets(); float4 my_elements = { ...
einpoklum's user avatar
  • 137k
1 vote
1 answer
104 views

I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...
Edward Murphy's user avatar
1 vote
0 answers
46 views

I'm using the OpenCL clBuildProgram() API function on a program created from a source string. The source is: kernel void foo(int val, write_only pipe int outPipe) { write_pipe(outPipe, &val); }...
einpoklum's user avatar
  • 137k
0 votes
0 answers
17 views

Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer". Does ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
26 views

OpenCL has the mechanism of "shared virtual memory" (SVM), where the same memory region is available both in OpenCL kernel code and in host-side code - and updates on one side affect the ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
16 views

Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
33 views

OpenCL C supports "vector data types" - a fixed number of scalar types which may be operated on together, as though they were a single scalar, mostly: we can apply arithmetic and logic ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
37 views

I'm looking at the clEnqueueWaitForEvents() OpenCL API function. As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
46 views

The OpenCL API has one object which is sort of a "kitchen sink" for a lot of stuff: The program (with handle type cl_program). It can hold: A textual program source ( ...
einpoklum's user avatar
  • 137k
1 vote
1 answer
46 views

In the following program, I compile a kernel for the first device on the first platform: const char* kernel_source_code = R"( __kernel void vectorAdd( __global float * __restrict C, ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
99 views

I'm trying to run a basic kernel in OpenCL. See the snipped attached const char kernel_source[] = "__kernel void matmul(__global float* A, __global float* B, __global float* C) { int row = ...
T3chstop's user avatar
0 votes
0 answers
46 views

There is an SIGV wile calling clGetPlatformIDs, which is sometimes a fatal SIGSEGV. Minimum exemple I found producing it: #include <stdio.h> #include <stdlib.h> #include "CL/cl.h"...
LentilesGR's user avatar
1 vote
0 answers
59 views

I implemented a N-body simulation that combines gravity and SPH (smoothed particle hydrodynamics). I want to optimize the SPH part. I use spacial hashing for the neighborhood search. On the host side (...
Paul Aner's user avatar
  • 543
0 votes
1 answer
49 views

OpenCL has several API functions to set callback functions - for events, for buffers/memory objects, for contexts and maybe more. What happens if you invoke one of these functions, more than once, on ...
einpoklum's user avatar
  • 137k
1 vote
1 answer
77 views

When creating a buffer in OpenCL, one passes a flags bitfield. One of these possible flags is CL_MEM_READ_WRITE, being the lowest bit in the field (value 1 << 0). Its documentation says that &...
einpoklum's user avatar
  • 137k
0 votes
0 answers
67 views

I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel: A naive version that uses only global memory. An ...
Mxneeb's user avatar
  • 19
1 vote
0 answers
50 views

I'm looking at the OpenCL specification for clGetKernelWorkGroupInfo(), and am noticing that the returned types for CL_KERNEL_GLOBAL_WORK_SIZE and CL_KERNEL_COMPILE_WORK_GROUP_SIZE are both size_t[3]. ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
44 views

I have AMD GPU. I'm using pyopencl. I have a context and a queue. Then I created an array: import pyopencl import pyopencl.array ctx = pyopencl.create_some_context(interactive=False) queue = pyopencl....
haael's user avatar
  • 1,069
0 votes
2 answers
127 views

I'm working on a project where I need to perform image inversion using GPU with OpenCL, but it's not working as expected. The goal is to invert the colors of an image using a kernel and then retrieve ...
SzPeter-9923's user avatar
0 votes
1 answer
53 views

I use the OpenCL.NET C# wrapper for OpenCL. My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL: Platform Version: OpenCL 2.1 AMD-APP (3570.0) Device Name: gfx90c Device Profile: ...
Chameleon's user avatar
  • 2,239
-1 votes
1 answer
72 views

i have the next task: write opencl program, which will sort massive using odd-even sort method. i wrote it, but have some troubles and i don't know how to solve it because i'm new in opencl here is my ...
kylo_gg's user avatar
2 votes
1 answer
74 views

I'm building my first OpenCL program in C in order to quickly compute the Mandelbrot Set, and I'm getting a segfault on clBuildProgram. Below is the relevant code: The kernel code (though the program ...
Lemma's user avatar
  • 143
0 votes
1 answer
44 views

The clCompileProgram() function of OpenCL takes, among other parameters, a cl_program handle named program, and a list of device handles: cl_device* device_list. The documentation for this function ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
36 views

In OpenCL, the clLinkProgram() function takes (among other things) A cl_context context handle; An array of cl_program handles of program objects. Now, a cl_program is always created in a context; ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
27 views

OpenCL offers built-in/intrinsic "vector types" (see table 3 at the link), such as int4 or float2. It also defines binary and unary elementwise operators which accept these types, e.g. ...
einpoklum's user avatar
  • 137k
1 vote
0 answers
27 views

What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is ...
Redshoe's user avatar
  • 301
1 vote
0 answers
71 views

In regular software development parlance, we begin with program sources; we then compile them into binary objects; and finally link the objects them into an executable object. And the entire process ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
42 views

The OpenCL API has the following function: cl_program clCreateProgramWithBinary( cl_context context, cl_uint num_devices, const cl_device_id* device_list, const size_t* lengths, ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
39 views

While executing a process, can the results of subsequent calls to clGetPlatformIds() change? i.e. can platforms disappear, appear, change order, or change handles ("ids")? I'm asking about ...
einpoklum's user avatar
  • 137k
0 votes
4 answers
1k views

I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container First attempt FROM ubuntu:latest RUN apt-get update #Done to make install non-...
sav's user avatar
  • 2,170
0 votes
0 answers
46 views

The API call for creating a pipe in OpenCL 2.0 and later is: cl_mem clCreatePipe( cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
11 views

In OpenCL, when you want to compile (not link) a kernel for some target devices, you call: cl_int clCompileProgram( cl_program program, cl_uint num_devices, const cl_device_id* device_list,...
einpoklum's user avatar
  • 137k
0 votes
0 answers
17 views

In OpenCL, before launching a kernel, we set its arguments using the clSetKernelArg() API function; and the cl_kernel handle must be backed by some kind of storage for the value of those arguments we ...
einpoklum's user avatar
  • 137k
2 votes
1 answer
64 views

In CUDA, launching a kernel means specifying its arguments, marshaled via an array of pointers: CUresult cuLaunchKernel ( CUfunction f, /* launch config stuff */, void** kernelParams, ...
einpoklum's user avatar
  • 137k
1 vote
0 answers
22 views

OpenCL command queues' enqueue API functions typically take a sequence of dependency events. For example: cl_int clEnqueueCopyBuffer( cl_command_queue command_queue, cl_mem src_buffer, ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
20 views

In OpenCL (let's say v3.0), I know one can create contexts using sub-devices. But - what happens if you release all references to a sub-device while the context is not released (i.e. has positive ...
einpoklum's user avatar
  • 137k
0 votes
1 answer
150 views

Faced with strange issue with cmake detection of OpenCL. When I use the following CMakeLists.txt: cmake_minimum_required(VERSION 3.10) # Uncomment to make it working # include(CheckSymbolExists) # ...
Denis Kotov's user avatar
1 vote
0 answers
14 views

The OpenCL API defines such a thing as the "default queue", for a given context and device in that context. Indeed, when we clCreateCommandQueueWithProperties, one of the properties we ...
einpoklum's user avatar
  • 137k
1 vote
1 answer
78 views

In OpenCL, when creating a command queue, we can set the options to indicate that this will be a "device" command queue; otherwise, it's a "host" queue. The C++ bindings have ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
71 views

OpenCL has had a bumpy ride over the years w.r.t. to the prospects of using C++ to write kernels. First there was "OpenCL C++ kernel language", standardized with OpenCL v2.1 - but that did ...
einpoklum's user avatar
  • 137k
0 votes
0 answers
44 views

I hope to use the GPU on a mobile phone to directly and quickly access and synchronize memory data, which the CPU will also read and modify in real-time. On a mobile phone, this is a simple example, ...
user28921138's user avatar
1 vote
0 answers
38 views

Grayscale kernel only reading first pixel The following is my grayscale.cl kernel implementation. The problem that I am facing is that the kernel seems to perform the grayscale calculation only on the ...
Arief Kurniawan's user avatar
1 vote
1 answer
95 views

I am writing a code with PyOpenCL to offload heavy computations to GPU. To optimize the algorithm, I would like to parallel some of the memory transfer operations with further calculations. However, I ...
MrCheatak's user avatar
  • 179
0 votes
1 answer
90 views

I have a naive question about GPU programming. (ChatGPT and Claude didn't really give me a convincing answer. Maybe I'm prompting badly.) GPU programming languages like CUDA and OpenCL organise ...
Martin Berger's user avatar
0 votes
2 answers
274 views

How to enable syntax highlighting and syntax checking of CL kernels written in OpenCL C (in *.cl files) in Visual Studio 2019 IDE? See the example below: The *.cl files use the OpenCL C syntax that ...
George Robinson's user avatar
1 vote
1 answer
130 views

I'm trying to code an opencl C++ application on an old Ubuntu laptop. It has two GPU's which are shown when I run lspci | grep VGA: 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core ...
Turgut's user avatar
  • 859

1
2 3 4 5
116