Does setting torch_dtype=torch.float16 override 8-bit quantization in BitsAndBytes?

I'm trying to run the Qwen2.5-Coder-3B model locally with 8-bit quantization using BitsAndBytes.

While loading the model, I noticed that some examples also specify torch_dtype=torch.float16. From my understanding, torch_dtype mainly affects the activation and output dtypes, not the quantized weights themselves.

However, I’m not completely sure whether setting torch_dtype=torch.float16 actually overrides the quantization or if both can safely coexist.

With torch_dtype=torch.float16

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-3B")

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-3B",
    quantization_config=bnb_config,
    torch_dtype=torch.float16,   # <-- does this override quantization?
    device_map="auto"
)

Without specifying torch_dtype

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-3B",
    quantization_config=bnb_config,
    device_map="auto"
)

What is the difference between these two setups in terms of:

How model weights are stored and loaded (INT8 vs FP16)
The dtype used for activations and outputs during inference
Whether setting torch_dtype=torch.float16 can override or interfere with 8-bit quantization applied by BitsAndBytes

asked Oct 24 at 22:47

SHresTho12

1478 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Does setting torch_dtype=torch.float16 override 8-bit quantization in BitsAndBytes?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest