0

I am using LLM, and I want to use quantization to boost the inference process. I am using the Nvidia Jetson AGX Orin GPU, which is an ARM-based architecture. I use this code

model_name = "tiiuae/Falcon3-10B-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,  # Load in 8-bit precision (int8)
    bnb_optimizations=True  # Enable optimizations for faster inference
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config

)
model.to('cuda')
model.eval() 

but I got this error

RuntimeError: 
🚨 Forgot to compile the bitsandbytes library? 🚨
1. You're not using the package but checked-out the source code
2. You MUST compile from source

Attempted to use bitsandbytes native library functionality but it's not available.

This typically happens when:
1. bitsandbytes doesn't ship with a pre-compiled binary for your CUDA version
2. The library wasn't compiled properly during installation from source

To make bitsandbytes work, the compiled library version MUST exactly match the linked CUDA version.
If your CUDA version doesn't have a pre-compiled binary, you MUST compile from source.

You have two options:
1. COMPILE FROM SOURCE (required if no binary exists):
   https://huggingface.co/docs/bitsandbytes/main/en/installation#cuda-compile
2. Use BNB_CUDA_VERSION to specify a DIFFERENT CUDA version from the detected one, which is installed on your machine and matching an available pre-compiled version listed above

Original error: Configured CUDA binary not found at /mnt/storage/hjaiji/conda/envs/hjaiji/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so

🔍 Run this command for detailed diagnostics:
python -m bitsandbytes

If you've tried everything and still have issues:
1. Include ALL version info (operating system, bitsandbytes, pytorch, cuda, python)
2. Describe what you've tried in detail
3. Open an issue with this information:
   https://github.com/bitsandbytes-foundation/bitsandbytes/issues

Native code method attempted to call: lib.cint8_vector_quant()

Probably the error comes from the nature the GPU architecture how can I install the bitsandbytes library for that specific GPU

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.