I am using LLM, and I want to use quantization to boost the inference process. I am using the Nvidia Jetson AGX Orin GPU, which is an ARM-based architecture. I use this code
model_name = "tiiuae/Falcon3-10B-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_8bit=True, # Load in 8-bit precision (int8)
bnb_optimizations=True # Enable optimizations for faster inference
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config
)
model.to('cuda')
model.eval()
but I got this error
RuntimeError:
🚨 Forgot to compile the bitsandbytes library? 🚨
1. You're not using the package but checked-out the source code
2. You MUST compile from source
Attempted to use bitsandbytes native library functionality but it's not available.
This typically happens when:
1. bitsandbytes doesn't ship with a pre-compiled binary for your CUDA version
2. The library wasn't compiled properly during installation from source
To make bitsandbytes work, the compiled library version MUST exactly match the linked CUDA version.
If your CUDA version doesn't have a pre-compiled binary, you MUST compile from source.
You have two options:
1. COMPILE FROM SOURCE (required if no binary exists):
https://huggingface.co/docs/bitsandbytes/main/en/installation#cuda-compile
2. Use BNB_CUDA_VERSION to specify a DIFFERENT CUDA version from the detected one, which is installed on your machine and matching an available pre-compiled version listed above
Original error: Configured CUDA binary not found at /mnt/storage/hjaiji/conda/envs/hjaiji/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so
🔍 Run this command for detailed diagnostics:
python -m bitsandbytes
If you've tried everything and still have issues:
1. Include ALL version info (operating system, bitsandbytes, pytorch, cuda, python)
2. Describe what you've tried in detail
3. Open an issue with this information:
https://github.com/bitsandbytes-foundation/bitsandbytes/issues
Native code method attempted to call: lib.cint8_vector_quant()
Probably the error comes from the nature the GPU architecture how can I install the bitsandbytes library for that specific GPU