0

I have a node pool of n1-highmem-4 machines with 1 NVIDIA Tesla T4 attached with a COS_CONTAINERD image. I am running a transformer model in python on a pod to execute the model on GPU. I get an Segmentation error whenever trying to move the model to GPU.

Pod Image:

FROM nvidia/cuda:12.2.0-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=off PIP_DISABLE_PIP_VERSION_CHECK=1

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3-pip python3-dev build-essential \
    && rm -rf /var/lib/apt/lists/*

RUN ln -sf /usr/bin/python3 /usr/local/bin/python

RUN pip install --upgrade pip \
    && pip install --no-cache-dir \
    --extra-index-url https://download.pytorch.org/whl/cu121 \
    torch==2.1.2

WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt

My requirement file has basic python modules that I require, including transformers==4.37.1. I do not have torch in them. They also don't have any nvidia/cuda specific modules (I'm assuming my base image covers me for any drivers required). In the pod I can see the following

:/app# nvidia-smi
Wed May  7 16:13:16 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |


:/app# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

12.2 and 12.1 seem compatible to me. When looking for torch cuda its giving me a segmentation error.

>>> import torch
>>> print(torch.__version__, torch.version.cuda)
2.1.2+cu121 12.1
>>> torch.cuda.is_available()
Segmentation fault (core dumped)

I've tried switching base images, torch versions, but nothing seems to work. Thanks in advance for someone who could help here.

0

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.