2

I wanted to use the 'Salesforce/SFR-Embedding-Mistral' embedding model, but it is too large for the GPU partition I have access to. Therefore, I considered quantizing the model, but I couldn't find a pre-quantized version available.

When I attempted to quantize it using bitsandbytes, it tries to load the entire model onto the GPU, which resulted in the same error.

model = AutoModel.from_pretrained(
    'Salesforce/SFR-Embedding-Mistral',
    trust_remote_code=True,
    device_map='auto',
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
)

Then, I tried to load the model onto the CPU first and then quantize it before moving the quantized model to the GPU:

model.to('cpu')
if torch.cuda.is_available():
    model.to('cuda')

However, bitsandbytes does not support changing devices for quantized models:

ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and cast to the correct `dtype`.

The solutions I found, such as this GitHub issue and this blog post, were not helpful or are outdated.

1 Answer 1

1

In this case, you only have to load the model without moving to CUDA. As exposed in the error log, when the parameter device_map is 'auto', device_map='auto', and you are using a quantization configuration, bnbconfig of 4 bits or 8 bits, the model is automatically moved to CUDA. Hence, it could not be moved twice and that error raises.

So, only use this code and will work:

model = AutoModel.from_pretrained(
    'Salesforce/SFR-Embedding-Mistral',
    trust_remote_code=True,
    device_map='auto',
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.