KeyError when loading GPT-OSS-20B locally with transformers on CPU

Ask Question

Asked 3 months ago

Modified 3 months ago

Viewed 799 times

Part of NLP Collective

I’m trying to load gpt-oss-20b locally using Hugging Face transformers with CPU only. Minimal code:

from transformers import pipeline
model_path = "/mnt/d/Projects/models/gpt-oss-20b"
pipe = pipeline("text-generation", model=model_path, torch_dtype="auto", device_map="auto")
pipe("Hello", max_new_tokens=20)

I get:

KeyError: 'model.layers.5.mlp.experts.gate_up_proj'

Here are some more details from the traceback:

Using MXFP4 quantized models requires a GPU, we will default to dequantizing the model to bf16
Loading checkpoint shards: 100%
Some parameters are on the meta device because they were offloaded to the cpu and disk.
Device set to use cpu
Traceback (most recent call last):
  File "/home/dev/projects/wolf-in-ai-clothing/convo_test.py", line 19, in invoke
    response = model(user_message, max_new_tokens=20, num_return_sequences=1)
  File ".../transformers/pipelines/text_generation.py", line 419, in _forward
    output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File ".../transformers/models/gpt_oss/modeling_gpt_oss.py", line 375, in forward
    hidden_states, _ = self.mlp(hidden_states)  # diff with llama: router scores
  File ".../transformers/models/gpt_oss/modeling_gpt_oss.py", line 159, in forward
    routed_out = self.experts(hidden_states, router_indices=router_indices, routing_weights=router_scores)
  File ".../accelerate/utils/offload.py", line 118, in __getitem__
    return self.dataset[f"{self.prefix}{key}"]
  File ".../accelerate/utils/offload.py", line 165, in __getitem__
    weight_info = self.index[key]
KeyError: 'model.layers.5.mlp.experts.gate_up_proj'

I verified that the directory exists and includes the model files. A similar problem appears in huggingface discussions where I followed the steps suggested by @noobaymax :

pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

pip install git+https://github.com/huggingface/transformers.git

pip install kernels

but the output remains the same.

Environment:

Python 3.12.3
transformers 4.56.0.dev0 (also tried 4.55.1)
torch 2.8.0
accelerate 1.10.0
Ubuntu 22.04 on WSL2, no GPU, 32GB RAM

How can I load this model correctly on CPU?

asked Aug 14 at 20:00

mindlesscoding

11 bronze badge

there are similar sites for ML: DataScience, CrossValidated

furas
– furas

2025-08-16 02:09:24 +00:00
Commented Aug 16 at 2:09

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

KeyError when loading GPT-OSS-20B locally with transformers on CPU

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest