I am trying to setup a project from github. It is based on python 3.8 and pytorch 1.6. The gpu I am using is NVIDIA GeForce RTX 3080 Ti GPU with Driver Version: 535.54.03 and CUDA Version: 12.2 according to nvidia-smi. (Using torch 13.1 torch.version.cuda shows 11.7. I could run a newer project this way, so i think the server it self is fine.) find / -iname "libcudart*" shows (some garbage and) /usr/local/cuda-12.2.
When i try to run the project within a python:3.8 docker container I get the following warning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
and the error:
RuntimeError: CUDA error: no kernel image is available for execution on the device
How can i get this running? Or is it not possible to run old torch version on new gpus?
As I am using a shared server I would prefer a solution running within docker. Some sort of environment (conda or venv) would also be possible.
I have seen some discussion regarding torch and this GPU, but as far as I saw they seem to suggest to update torch to a newer version. I don't want to do this, as the requirements file explicitly requests the old version, and I don't want to break the project.
I also read that one can use older cuda versions by using a corresponding docker image as base.
I tried a docker container with cuda 10, as i suspected that the cuda version was to new, but that results in the same error. Dockerfile:
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
# setup python3.8
RUN apt-get update && \
apt-get install -y software-properties-common curl wget && \
apt-get clean
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && \
apt-get install -y python3.8 python3.8-distutils python3.8-venv && \
apt-get clean
RUN curl https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py && \
python3.8 /tmp/get-pip.py && \
rm /tmp/get-pip.py
RUN apt-get update && \
apt-get install -y build-essential libffi-dev python3.8-dev && \
apt-get clean
RUN python3.8 -m pip install --upgrade pip setuptools wheel
RUN python3.8 -m pip install jsonnet
# setup the project
RUN apt-get update && apt-get install git -y --no-install-recommends && apt-get clean && rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/salesforce/QAFactEval
WORKDIR QAFactEval
RUN python3.8 -m pip install -e .
RUN python3.8 -m pip install gdown
RUN ./download_models.sh
COPY qafacteval_minimal.py qafacteval_minimal.py
CMD python3.8 qafacteval_minimal.py
Python file:
from qafacteval import QAFactEval
kwargs = {"cuda_device": 0, "use_lerc_quip": True, \
"verbose": True, "generation_batch_size": 32, \
"answering_batch_size": 32, "lerc_batch_size": 8}
model_folder = "models" # path to models downloaded with download_models.sh
metric = QAFactEval(
lerc_quip_path=f"{model_folder}/quip-512-mocha",
generation_model_path=f"{model_folder}/generation/model.tar.gz",
answering_model_dir=f"{model_folder}/answering",
lerc_model_path=f"{model_folder}/lerc/model.tar.gz",
lerc_pretrained_model_path=f"{model_folder}/lerc/pretraining.tar.gz",
**kwargs
)
doc = "A man was walking down the road."
summ = "A man was walking."
metric.score_batch_qafacteval([doc], [[summ]], return_qa_pairs=False)[0][0]['qa-eval']['lerc_quip']
Execution commands
docker build -t qafacteval:minimal .
docker container run --gpus all --rm qafacteval:minimal
I also checked if i could install torch1.6 using commands like:
pip install torch==1.6.0 torchvision==0.7.0 torchaudio==0.6.0 -f https://download.pytorch.org/whl/cu100/torch_stable.html
pip install torch==1.6.0+cu100 torchvision==0.7.0+cu100 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
But that didn't seem to make a difference.
I found no torch 1.6 install instructions using for cuda 12 on the previus pytorch version page.
The conda instruction also results in a torch.version.cuda 10.2, so i guess this will also not be compatible.
On a "NVIDIA GeForce GTX 1660" it is running, torch.version.cuda showing 10.2. So i guess this gpu is compatible with cuda 10.2. (But the gpu runs out of memory so I need the other one.) (Driver Version: 535.54.03)
If you need any further information, let me know and I will try to provide it asap.
Edit: Hints that the GPU doesn't support older CUDA versions. So simply downgrading doesn't seem to be an option.