936 questions
0
votes
0
answers
42
views
Optimization Challenge in Hugging Face: Effcienntly Serving Muliple, Differently Sized LLMs on a Single Gpu with PyTorch [closed]
I am currently working on a Python based Gen AI project that requires the efficient deployment and serving of multiple LLMs specifically models with different parameter counts ( Llama-2 7B and Mistral ...
0
votes
0
answers
30
views
When Running a GGUF Model from Hugging Face Using Ollama, How Will the Modelfile Be Selected?
Background Knowledge
According to the Hugging Face documentation, now it's supported to run a GGUF model directly using Ollama with ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF something like ...
0
votes
1
answer
143
views
Error in pyannote.audio Pipeline. Python. HuggingFace
Getting issue with use_auth_token keyword while implementing a pipeline from pyannote.audio.
I already used:-
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", ...
-1
votes
0
answers
24
views
How to use the models from huggingface from local machine server
I am trying to use the following model Emotion Llama and try to understand how to download the models and place them in the right dir from huggingface. It actually suggests to donwload three models in ...
1
vote
0
answers
159
views
Transformers 'could not import module pipeline' to jupyter notebook
I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code
from ...
1
vote
1
answer
78
views
Xcode Can't Find swift-transformers Package
I'm trying to implement Speech-to-Text transcription in my Swift app using Hugging Face's swift-transformers package to run Whisper models locally.
I've added the package to my Xcode project, but when ...
1
vote
0
answers
68
views
How to pass P_map: dict[str, torch.Tensor] to PEFT (LoRA)?
My proxy goal is to change LoRA from h = (W +BA)x to h = (W + BAP)x. Preliminary code attached for your reference
My actual goal is to train a model with the following loss: 〖Θ ̃=(arg min)┬Δ ̂ 〗〖‖𝑓_(...
-1
votes
2
answers
95
views
LangChain HuggingFace ChatHuggingFace raises StopIteration with any model
I’m trying to use LangChain’s Hugging Face integration to chat with the model TinyLlama/TinyLlama-1.1B-Chat-v1.0 for the very first time, but I’m getting a StopIteration error when calling .invoke().
...
0
votes
0
answers
61
views
ONNX Runtime Helsinki-NLP in Java
has anyone managed to translate something using Helsinki-NLP and ONNX Runtime in Java? Using a Python script, I generated these files:
├── encoder_model.onnx
├── decoder_model.onnx
├── ...
2
votes
1
answer
80
views
Why does my system message content contain "image": None when mapping conversation dataset?
I'm creating a conversation dataset for an image classification task where the system message should contain only text, and the user message contains both text and an image. However, after mapping my ...
0
votes
0
answers
96
views
tokenizer error: RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0
When fine tuning a model, using the HuggingFace inference hub, the error below was encountered:
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The ...
0
votes
0
answers
89
views
How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights
After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...
1
vote
0
answers
61
views
Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device
I encounterd this error while trying to run hugging face trainer on a multi-gpu.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
I use a ...
0
votes
1
answer
379
views
Hugging face api is not returning any response and receiving errors
I have been stepping into GenAI and currently I am working with Hugging face's open source models. However, I am not able to receive any response from the API. I have created access token on hugging ...
0
votes
0
answers
61
views
How do I compute validation loss for a fine-tuned Qwen model in Hugging Face Transformers during evaluation?
I trained a Qwen model on my own dataset. Now I need to evaluate my trained model using the loss function, but I don’t know how to do it. I saw examples for other metrics such as accuracy and ...
0
votes
0
answers
25
views
Commit unable to auto-activate while using Gradio on Huggingface, but adding a blank line and commit it from the website works
I was trying to use Gradio in Huggingface Spaces. I added an app.py file in my VScode, and VScode told me that the push was all right. However, Huggingface Spaces declared "No application file&...
1
vote
0
answers
37
views
Using HuggingFace API using Azure ML Studio
I have an Azure ML studio notebook. I want to use the HuggingFace "cross-encoder-nli-deberta-v3-base" model to do zero-shot classification.
This code instantiates the endpoint without error:
...
0
votes
0
answers
218
views
"Model is not supported for task text generation, supported task: conversational" on LangChain HuggingFaceInference JS
Just trying to use an Text-Gen LLM from HuggingFace Inference Provider using LangChain in Node.js, I chose Model Qwen/Qwen2.5-1.5B-Instruct, trying out other models did not seem to work, I couldn't ...
0
votes
0
answers
57
views
Smolagents CodeAgent gets error from correct code
the Smolagents CodeAgent is given a task to convert a string into markdown table format. It successfully captures the related part of the string and writes the code for markdown table formatting. ...
0
votes
1
answer
117
views
How to load dataset from huggingface to google colab?
I am trying to load a training dataset in my Google Colab notebook but keep getting an error.
Here is the code snippet which returns the error:
from datasets import load_dataset
ds = load_dataset(&...
1
vote
2
answers
171
views
How to interpret cosine similarity using EmbeddingSimilarityEvaluator
I am reading about Text embeddings in LLM from the book Hands-On Large Language Models. It is mentioned that as follows:
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from ...
1
vote
0
answers
799
views
KeyError when loading GPT-OSS-20B locally with transformers on CPU
I’m trying to load gpt-oss-20b locally using Hugging Face transformers with CPU only. Minimal code:
from transformers import pipeline
model_path = "/mnt/d/Projects/models/gpt-oss-20b"
pipe = ...
0
votes
1
answer
161
views
Is HuggingFace Accelerate's init_empty_weights Context Manager (Properly) Implemented for a Diffuser?
Discussion
HuggingFace accelerate's init_empty_weights() properly loads all text encoders I tested to the PyTorch meta device and consumes no apparent memory or disk space while loaded.
However, it ...
0
votes
0
answers
232
views
TypeError: PPOTrainer.__init__() got an unexpected keyword argument 'config'
I am trying to initialize a PPO_trainer but have issues.
from trl import PPOTrainer, PPOConfig
ppo_config = PPOConfig(
batch_size=4,
learning_rate=1e-5,
mini_batch_size=2,
use_cpu=...
1
vote
0
answers
53
views
BLIP Fine-Tuning: Special Token Always Biased to One Class in Generated Caption
I'm trying to fine-tune Hugging Face BLIP (Bootstrapped Language-Image Pretraining) to classify pizza boxes as either recyclable (clean) or non-recyclable (contaminated) by generating captions that ...
0
votes
0
answers
56
views
Why is LeRobot’s policy ignoring additional camera streams despite custom `input_features`?
I'm using LeRobot to train a SO101 arm policy with 3 video streams (front, above, gripper) and a state vector. The dataset can be found at this link.
I created a custom JSON config (the train_config....
0
votes
0
answers
47
views
TypeError: 'NoneType' object is not iterable when using ChatHuggingFace with TinyLlama/TinyLlama-1.1B-Chat-v1.0 in LangChain
I'm trying to use the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model from Hugging Face with LangChain using the langchain_huggingface integration. My goal is to get a simple response from the model using ...
1
vote
1
answer
322
views
Getting StopIteration when using HuggingFaceEndpoint with LangChain and flan-t5-large
I'm trying to use the langchain_huggingface.HuggingFaceEndpoint integration to call the "google/flan-t5-large" model from Hugging Face in a LangChain pipeline. Here's my code:
from langchain....
0
votes
1
answer
239
views
RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model
I'm trying to load the Qwen2.5-VL-7B-Instruct model from hugging face with 4-bit weight-only quantization using TorchAoConfig (similar to how its mentioned in the documentation here), but I'm getting ...
0
votes
0
answers
63
views
Hugging Face applying Transformation on nested to datasets without loading into memory
I am trying to apply below transformation for preparing my datasets for fine tuning using unsloth huggingface. It requires the dataset to be in following format.
def convert_to_conversation(sample):
...
0
votes
0
answers
92
views
Python Flask App with LlamaIndex + Ollama application significantly slower in offline Docker container vs online version with identical setup
Problem
I have two nearly identical Python applications using LlamaIndex + Ollama for document Q&A:
Online version: ~5 seconds response time
Offline version: ~18 seconds response time
FYI i am ...
0
votes
1
answer
240
views
Language Model Evaluation with Custom Task - Hugging Face Lighteval
I am creating a benchmark to evaluate a language model. First, I generated the dataset that I'm gonna prompt the Language model with. Subsequently, I tried to evaluate any tiny language model just to ...
1
vote
1
answer
677
views
SFTTrainer: The specified `eos_token` ('<EOS_TOKEN>') is not found in the vocabulary of the given `processing_class` (Qwen2TokenizerFast)
I upgraded my Python trl package to version 0.18.1. I use the SFTTrainer of the package to finetune a Qwen2.5 LLM neural net. Previously, I used the TrainingArgument class to set additional params. I ...
0
votes
1
answer
142
views
How to get the code of the hugging face models?
There is a simple way to download a model from hugging face,
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sentence-...
3
votes
0
answers
208
views
Cannot interence with images on llama-cpp-python
I am new to this. I have been trying but could not make the the model answer on images.
from llama_cpp import Llama
import torch
from PIL import Image
import base64
llm = Llama(
model_path='Holo1-...
0
votes
0
answers
44
views
Translation returns <unk> token
I'm having relatively good results with HelsinkiNlp models for translation, except for one thing: some special characters are omitted from the translation. If I decode without skipping the special ...
1
vote
0
answers
94
views
Sentence similarity pipeline with @huggingface/transformers
Wanted to use the pipeline api from @huggingface/transformers js for sentence-similarity - but I do not see a specific pipeline for it.
The closest thing is text classification and feature extractions ...
0
votes
0
answers
56
views
Using llama-index with the deployed LLM
I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...
2
votes
1
answer
210
views
JSONDecodeError while using HuggingFace Inference API with LangChain for Embeddings
I’m trying to generate embeddings using the Hugging Face Inference API with LangChain in Python, but I’m running into issues. My goal is to use the API (not local models) to generate embeddings for ...
0
votes
0
answers
42
views
I am getting a .NET HuggingFace 403 or 404 error
In my .NET project, I am configuring the Huggingface library as follows:
builder.Services
.AddKernel()
.AddHuggingFaceChatCompletion(
model: "deepseek-ai/DeepSeek-R1",
...
0
votes
0
answers
30
views
Webpack error: "Module parse failed: Unexpected character '�' (1:0)" when using @xenova/transformers
I'm trying to run a sentiment analysis function using the @xenova/transformers package in a NextJS project with Webpack, but I'm encountering the following error:
Module parse failed: Unexpected ...
1
vote
1
answer
87
views
HfHubHTTPError when calling DoclingLoader with a pdf file
I have installed docling successfully, but when doing the following:
from langchain_docling import DoclingLoader
source_path = "shared\abc.pdf"
loader = DoclingLoader(file_path=source_path)
...
0
votes
1
answer
227
views
how to download huggingface-model files by filtering unwanted files
a huggingface model, like Qwen32B-GGUF, contains some quantization-related files which are large. Perhaps, only use one quantization-related file and the rest is not used.
By huggingface-cli, it ...
0
votes
0
answers
288
views
Getting StopIteration error in HuggingFace
I am using Colab and HuggingFace Token is added in Colab secrets.
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv
from google.colab import ...
1
vote
2
answers
675
views
Facing issue using a model hosted on HuggingFace Server and talking to it using API_KEY
I am trying to create a simple langchain app on text-generation using API to communicate with models on HuggingFace servers.
I created a “.env” file and stored by KEY in the variable: “...
0
votes
0
answers
48
views
What is the proper way to fill a batch in training an LM all the way to the end eg how to correct my tokenize_and_group_texts_via_blocks?
I’m preparing a text dataset for next-token language-model pre-training. Using HF datasets with batched=True, I wrote a helper that:
prepends a BOS token (if the tokenizer has one),
appends an EOS ...
0
votes
0
answers
109
views
How can I properly load a LoRA weight into a pretrained Stable Diffusion model on TorchServe and enable parallel inference?
I'm attempting to serve a pretrained Stable Diffusion model with LoRA weights applied using TorchServe. However, the LoRA weights don't seem to load properly, and I'm not sure why. Could anyone help ...
0
votes
1
answer
583
views
ollama.generate raises model not found error: "hf.co/mradermacher/Llama-3.2-3B-Instruct-uncensored-GGUF"
I'm trying to run a Python script that uses the ollama library to generate responses from a custom LLM model. My code attempts to call ollama.generate() using the following model name:
chosen_model = '...
0
votes
0
answers
137
views
Hugging Face Sentence Transformer API returning 400 error for embeddings with incorrect format
import { DataAPIClient } from "@datastax/astra-db-ts";
import { PuppeteerWebBaseLoader } from "langchain/document_loaders/web/puppeteer";
import axios from "axios";
...
0
votes
1
answer
107
views
Unable to connect to hugging face model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-small-en-v1.5")
sentences = [
"The weather is lovely today.",
"It's so ...