Cannot use bitsandbytes for quantization of LLM

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 180 times

I am using LLM, and I want to use quantization to boost the inference process. I am using the Nvidia Jetson AGX Orin GPU, which is an ARM-based architecture. I use this code

model_name = "tiiuae/Falcon3-10B-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,  # Load in 8-bit precision (int8)
    bnb_optimizations=True  # Enable optimizations for faster inference
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config

)
model.to('cuda')
model.eval()

but I got this error

RuntimeError: 
🚨 Forgot to compile the bitsandbytes library? 🚨
1. You're not using the package but checked-out the source code
2. You MUST compile from source

Attempted to use bitsandbytes native library functionality but it's not available.

This typically happens when:
1. bitsandbytes doesn't ship with a pre-compiled binary for your CUDA version
2. The library wasn't compiled properly during installation from source

To make bitsandbytes work, the compiled library version MUST exactly match the linked CUDA version.
If your CUDA version doesn't have a pre-compiled binary, you MUST compile from source.

You have two options:
1. COMPILE FROM SOURCE (required if no binary exists):
   https://huggingface.co/docs/bitsandbytes/main/en/installation#cuda-compile
2. Use BNB_CUDA_VERSION to specify a DIFFERENT CUDA version from the detected one, which is installed on your machine and matching an available pre-compiled version listed above

Original error: Configured CUDA binary not found at /mnt/storage/hjaiji/conda/envs/hjaiji/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so

🔍 Run this command for detailed diagnostics:
python -m bitsandbytes

If you've tried everything and still have issues:
1. Include ALL version info (operating system, bitsandbytes, pytorch, cuda, python)
2. Describe what you've tried in detail
3. Open an issue with this information:
   https://github.com/bitsandbytes-foundation/bitsandbytes/issues

Native code method attempted to call: lib.cint8_vector_quant()

Probably the error comes from the nature the GPU architecture how can I install the bitsandbytes library for that specific GPU

edited May 14 at 20:01

talonmies

72.7k35 gold badges204 silver badges296 bronze badges

asked May 14 at 13:03

Chawki-Hjaiji

233 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Cannot use bitsandbytes for quantization of LLM

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest