3

I am trying to setup a project from github. It is based on python 3.8 and pytorch 1.6. The gpu I am using is NVIDIA GeForce RTX 3080 Ti GPU with Driver Version: 535.54.03 and CUDA Version: 12.2 according to nvidia-smi. (Using torch 13.1 torch.version.cuda shows 11.7. I could run a newer project this way, so i think the server it self is fine.) find / -iname "libcudart*" shows (some garbage and) /usr/local/cuda-12.2.

When i try to run the project within a python:3.8 docker container I get the following warning:

NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.

and the error:

RuntimeError: CUDA error: no kernel image is available for execution on the device

How can i get this running? Or is it not possible to run old torch version on new gpus?

As I am using a shared server I would prefer a solution running within docker. Some sort of environment (conda or venv) would also be possible.


I have seen some discussion regarding torch and this GPU, but as far as I saw they seem to suggest to update torch to a newer version. I don't want to do this, as the requirements file explicitly requests the old version, and I don't want to break the project.

I also read that one can use older cuda versions by using a corresponding docker image as base.

I tried a docker container with cuda 10, as i suspected that the cuda version was to new, but that results in the same error. Dockerfile:

FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime

# setup python3.8
RUN apt-get update && \
    apt-get install -y software-properties-common curl wget && \
    apt-get clean
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && \
    apt-get install -y python3.8 python3.8-distutils python3.8-venv && \
    apt-get clean
RUN curl https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py && \
    python3.8 /tmp/get-pip.py && \
    rm /tmp/get-pip.py
RUN apt-get update && \
    apt-get install -y build-essential libffi-dev python3.8-dev && \
    apt-get clean
RUN python3.8 -m pip install --upgrade pip setuptools wheel
RUN python3.8 -m pip install jsonnet

# setup the project
RUN apt-get update && apt-get install git -y --no-install-recommends && apt-get clean && rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/salesforce/QAFactEval
WORKDIR QAFactEval
RUN python3.8 -m pip install -e .
RUN python3.8 -m pip install gdown
RUN ./download_models.sh

COPY qafacteval_minimal.py qafacteval_minimal.py

CMD python3.8 qafacteval_minimal.py

Python file:

from qafacteval import QAFactEval

kwargs = {"cuda_device": 0, "use_lerc_quip": True, \
        "verbose": True, "generation_batch_size": 32, \
        "answering_batch_size": 32, "lerc_batch_size": 8}


model_folder = "models"     # path to models downloaded with download_models.sh
metric = QAFactEval(
    lerc_quip_path=f"{model_folder}/quip-512-mocha",
    generation_model_path=f"{model_folder}/generation/model.tar.gz",
    answering_model_dir=f"{model_folder}/answering",
    lerc_model_path=f"{model_folder}/lerc/model.tar.gz",
    lerc_pretrained_model_path=f"{model_folder}/lerc/pretraining.tar.gz",
    **kwargs
)

doc = "A man was walking down the road."
summ = "A man was walking."

metric.score_batch_qafacteval([doc], [[summ]], return_qa_pairs=False)[0][0]['qa-eval']['lerc_quip']

Execution commands

docker build -t qafacteval:minimal .
docker container run --gpus all --rm qafacteval:minimal

I also checked if i could install torch1.6 using commands like:

pip install torch==1.6.0 torchvision==0.7.0 torchaudio==0.6.0 -f https://download.pytorch.org/whl/cu100/torch_stable.html
pip install torch==1.6.0+cu100 torchvision==0.7.0+cu100 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html

But that didn't seem to make a difference. I found no torch 1.6 install instructions using for cuda 12 on the previus pytorch version page. The conda instruction also results in a torch.version.cuda 10.2, so i guess this will also not be compatible.

On a "NVIDIA GeForce GTX 1660" it is running, torch.version.cuda showing 10.2. So i guess this gpu is compatible with cuda 10.2. (But the gpu runs out of memory so I need the other one.) (Driver Version: 535.54.03)

If you need any further information, let me know and I will try to provide it asap.


Edit: Hints that the GPU doesn't support older CUDA versions. So simply downgrading doesn't seem to be an option.

1
  • you could rebuild pytorch 1.6 against CUDA 11.1 or newer. It might not be easy. Commented Jul 19, 2023 at 16:22

1 Answer 1

2

The error: RuntimeError: CUDA error: no kernel image is available for execution on the device

How can i get this running?

Unfortunately, you can’t. CUDA has a mechanism for allowing older code to be runtime recompiled to execute on newer hardware, but the Pytorch developers have to enable it and build the code they distribute with the extra output required for it to work. Notionally for space reasons, the Pytorch packages are not built in this way.

Or is it not possible to run old torch version on new gpus?

This is the case, unfortunately. If there is no binary payload for your GPU and no intermediate code to runtime compile code to run on your GPU, the package can’t work.

Your only choice would be to build your own custom Pytorch package from source against CUDA 11. That isn’t a simple process.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.