1

I am currently working with OpenMP offloading using LLVM/clang-16 (built from the github repository). Using the built-in profiling tools in clang (using environment variables such as LIBOMPTARGET_PROFILE=profile.json and LIBOMPTARGET_INFO) I was able to confirm that my code is executed on my GPU but when I try to profile the code using nvprof or ncu (from the NVIDIA Nsight tool suite) I get an error/warning stating, that the profiler did not detect any kernel launches:

> ncu ./saxpy
Time of kernel: 0.000004
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

This is my test code:

#include <iostream>
#include <omp.h>
#include <cstdlib>

void saxpy(float a, float* x, float* y, int sz) {
    double t = 0.0;
    double tb, te;
    tb = omp_get_wtime();
#pragma omp target teams distribute parallel for map(to:x[0:sz]) map(tofrom:y[0:sz])
{
    for (int i = 0; i < sz; i++) {
        y[i] = a * x[i] + y[i];
    }
}
    te = omp_get_wtime();
    t = te - tb;
    printf("Time of kernel: %lf\n", t);
}

int main() {
    auto x = (float*) malloc(1000 * sizeof(float));
    auto y = (float*) calloc(1000, sizeof(float));
    
    for (int i = 0; i < 1000; i++) {
        x[i] = i;
    }
    
    saxpy(42, x, y, 1000);
    
    return 0;
}

Compiled using the following command:

> clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda main.cpp -o saxpy --cuda-path=/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/10.2 --offload-arch=sm_61 -fopenmp-offload-mandatory

What do I need to do to enable profiling? I have seen others using ncu for clang compiled OpenMP offloading code without additional steps but maybe I am completely missing something.

1

2 Answers 2

1

By looking at the debug output generated when the program is executed with LIBOMPTARGET_DEBUG=1 and after receiving help from other forums I was able to fix this issue. The program cannot find the necessary files of the OpenMP CUDA runtime library whenever it is started through ncu (or nsys).

A workaround is to add the path to those libraries to the LD_LIBRARY_PATH environment variable (e.g. export LD_LIBRARY_PATH=/opt/llvm/lib:$LD_LIBRARY_PATH).

NVIDIA is now aware of this problem and are "looking into why that is the case".

Sign up to request clarification or add additional context in comments.

Comments

0

With nsys profile, I find the following works:

nsys profile --stats=true --trace=openmp,cuda ./application [args]

This will generate nsys-rep files that can be imported to Nsight Systems profiler.

1 Comment

Answer needs more meat to explain why is relevant.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.