Newest 'language-model' Questions

0 votes

1 answer

583 views

ollama.generate raises model not found error: "hf.co/mradermacher/Llama-3.2-3B-Instruct-uncensored-GGUF"

I'm trying to run a Python script that uses the ollama library to generate responses from a custom LLM model. My code attempts to call ollama.generate() using the following model name: chosen_model = '...

JaS

45

asked May 4 at 11:32

0 votes

1 answer

94 views

Unable to export custom language model data (Speech framework)

I am trying to customise language model but face the error when exporting. I created a project and copied example code from Apple: import Speech class Data { func export() async throws { ...

Goran

1

asked Feb 26 at 11:56

2 votes

1 answer

259 views

Sample weights for loss computing by huggingface transformer model

I'm training a GPT2LMHeadModel in Python using huggingface's transformers library. The task is next token prediction. If I understand correctly, if this object is provided a labels argument, it should ...

user12138762

91

asked Aug 14, 2024 at 19:14

0 votes

1 answer

986 views

DSPy: How to get the number of tokens available for the input fields?

This is a cross-post from Issue #1245 of DSPy GitHub Repo. There were no responses in the past week, am I am working on a project with a tight schedule. When running a DSPy module with a given ...

Tom Lin

110

asked Jul 13, 2024 at 8:25

0 votes

1 answer

529 views

'SymbolicTensor' object cannot be interpreted as an integer

I have been trying to implement Peephole LSTM using Tensorflow, and I am getting the error below Error below is my model and I am not sure why I cant get the input layer in my model summary Model and ...

Ramin sh

1

asked May 15, 2024 at 17:01

1 vote

0 answers

420 views

Using Language Model Phi-3-Mini quantized version in Jupyter Notebook

I am trying to use a small language model in my jupyter notebook and am not able to find a working solution. I want to use the quantized version of Phi-3-mini as that is small enough to fit on my GPU ...

Christoph

13

asked May 15, 2024 at 13:31

0 votes

2 answers

424 views

Issues with Generating Text from Fine-Tuned Mistral 7B Model on Georgian Dataset

I've fine-tuned the Mistral 7B model using a Georgian dataset with approximately 100,000 articles, including custom tokenizer fine-tuning. The fine-tuning process took about 9 hours. However, when I ...

SabaKhupenia

1

asked Apr 2, 2024 at 13:46

0 votes

1 answer

754 views

What are the differences between 'fairseq' and 'fairseq2'?

What are the differences between fairseq and fairseq2? Quotes from the github pages are not very clear Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train ...

Long

1,845

asked Jan 4, 2024 at 9:46

2 votes

1 answer

700 views

specify task_type for embeddings in Vertex AI

Has someone tried the last update of GCP TextEmbeddingInput that allows to specify the task_type of your application? Theoretically it should allows you to use different fine tuned models to generate ...

Asia Salpa

21

asked Dec 6, 2023 at 16:01

1 vote

1 answer

806 views

Why do we add |V| in the denominator in the Add-One smoothing for n-gram language models?

In NLP when we use Laplace(Add-one) smoothing technique we assume that the every word is seen one more time than the actual count and the formula is like this where V is the size of the vocabulary. ...

hxdshell

27

asked Aug 11, 2023 at 10:21

0 votes

0 answers

303 views

How to vectorize text data in Pandas.DataFrame and then one_hot encoode it "inside" the model

I try to implement sequence model (trained to predict next word) built on one-hot encoded vector sequences. My custom one-hot encoder works well. But just as exercise I want to do all things with ...

x3mEr

33

asked Aug 3, 2023 at 11:15

0 votes

1 answer

631 views

With a HuggingFace trainer, how do I show the training loss versus the eval data set?

I'm running: #original training script trainer = transformers.Trainer( model=model, train_dataset=train_dataset, eval_dataset=test_dataset, #turn on the eval dataset for comparisons ...

Ronan McGovern

71

asked Jul 25, 2023 at 12:10

2 votes

1 answer

643 views

GPT4All Metal Library Conflict during Embedding on M1 Mac

I am trying to run GPT4All's embedding model on my M1 Macbook with the following code: import json import numpy as np from gpt4all import GPT4All, Embed4All # Load the cleaned JSON data with open('...

user20140267

33

asked Jul 17, 2023 at 21:42

1 vote

1 answer

563 views

Python-based way to extract text from scientific/academic paper for a language model

I am looking for a method to extract only the core text of a scientific paper. The paper is structured in paragraphs and I only want to cover the text without any mail-adress, websites, tables or ...

Enes Kayacan

11

asked Jul 13, 2023 at 7:58

0 votes

1 answer

2k views

How to get the embedding of any vocabulary token in GPT?

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["...

Penguin

2,651

asked Jul 12, 2023 at 14:09

0 votes

1 answer

822 views

How to get the vector embedding of a token in GPT?

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["...

Penguin

2,651

asked Jul 10, 2023 at 16:06

0 votes

1 answer

1k views

How to use a biomedical model from Huggingface to get text embeddings?

I have biomedical text that I'm trying to get the embeddings for using a biomedical transformer: my_text = ["Chocolate has a history of human consumption tracing back to 400 AD and is rich in ...

Penguin

2,651

asked Jul 1, 2023 at 17:51

1 vote

1 answer

337 views

Error while installing lmql[hf] using pip: "No matching distribution found for lmql[hf]

I am trying to install lmql[hf] using the pip package manager in order to set up a local LMQL playground. Following the documentation, I ran the command pip install lmql[hf]. However, I encountered ...

Pavel

11

asked Jun 25, 2023 at 23:25

4 votes

1 answer

5k views

OpenAI Fine-tuning API: Why would I use LlamaIndex or LangChain instead of fine-tuning a model?

I'm just getting started with working with LLMs, particularly OpenAIs and other OSS models. There are a lot of guides on using LlamaIndex to create a store of all your documents and then query on them....

Curunir The Colorful

71

asked Jun 24, 2023 at 12:12

3 votes

2 answers

4k views

ArrowInvalid: Column 4 named input_ids expected length 1000 but got length 328

# Formatting block_size = 128 # or any number suitable to your context def group_texts(examples): # Concatenate all 'input_ids' concatenated_examples = sum(examples["input_ids"], [])...

Nischal

31

asked Jun 19, 2023 at 19:27

28 votes

4 answers

28k views

Difference between Instruction Tuning vs Non Instruction Tuning Large Language Models

What is the difference between instruction tuning and normal fine-tuning for large language models? Also the instruction-tuning I'm referring to isn't the in-context/prompt one. All the recent papers ...

Flo

361

asked Jun 11, 2023 at 15:37

2 votes

1 answer

1k views

How to structure data for question-answering task to fine-tune a model with Huggingface run_qa.py example?

import sagemaker import boto3 from sagemaker.huggingface import HuggingFace try: role = sagemaker.get_execution_role() except ValueError: iam = boto3.client('iam') role = iam.get_role(...

Tom Bomer

113

asked Jun 6, 2023 at 16:30

7 votes

1 answer

744 views

Starcoder finetuning - How to select the GPU and how to estimate the time it will take to finetune

I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance. It's says in the documentation that for training the model, they used 512 Tesla A100 ...

Aadesh Kulkarni

577

asked Jun 1, 2023 at 17:22

3 votes

1 answer

9k views

Fine-tuning a pre-trained LLM for question-answering

Objective My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with ...

Tom Bomer

113

asked May 31, 2023 at 11:55

1 vote

2 answers

9k views

How can I speed up a QA Langchain using load_qa_with_sources_chain?

I am currently running a QA model using load_qa_with_sources_chain(). However, when I run it with three chunks of each up to 10,000 tokens, it takes about 35s to return an answer. I would like to ...

derlunter

11

asked May 18, 2023 at 23:37

1 vote

1 answer

845 views

Why is perplexity calculation giving different results for the same input?

I'm following Huggingface doc on calculating the perplexity of fixed-length models. I'm trying to verify that the formula works for various strings and I'm getting odd behavior. In particular, they ...

Penguin

2,651

asked May 6, 2023 at 2:41

0 votes

1 answer

299 views

How to denoise text using T5?

I'm trying to denoise text using a T5 model following the Huggingface doc: from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("t5-small")...

Penguin

2,651

asked May 5, 2023 at 21:24

5 votes

1 answer

3k views

How is scaled_dot_product_attention meant to be used with cached keys/values in causal LM?

I'm implementing a transformer and I have everything working, including attention using the new scaled_dot_product_attention from PyTorch 2.0. I'll only be doing causal attention, however, so it seems ...

turboderp

51

asked May 4, 2023 at 20:39

0 votes

0 answers

129 views

Endless loop in a text generation script

I am trying to make a simple text generator using the Bulgarian language but my code is stuck in an endless loop. Here is the code: from tokenization import tokenize_bulgarian_text from nltk import ...

mark-de

1

asked Apr 21, 2023 at 17:57

1 vote

0 answers

486 views

Not able to resolve TypeError: Transformer.forward() got an unexpected keyword argument 'labels'

I am trying to implement the chapter 10 of NLP with transformers by lewis tunstall book. I am facing an error in this particular cell : from transformers.optimization import get_scheduler ...

Bhupinder singh

11

asked Apr 21, 2023 at 5:10

3 votes

2 answers

3k views

Finetuning a LM vs prompt-engineering an LLM

Is it possible to finetune a much smaller language model like Roberta on say, a customer service dataset and get results as good as one might get with prompting GPT-4 with parts of the dataset? Can a ...

Tolu

1,167

asked Apr 18, 2023 at 20:15

3 votes

0 answers

2k views

Langchain Chatbot with Memory + Vector Database

In Langchain, what is the suggested way to build a chatbot with memory and retrieval from a vector embedding database at the same time? The examples in the docs add memory modules to chains that do ...

Rexcirus

3,007

asked Mar 31, 2023 at 9:59

1 vote

1 answer

469 views

Cannot allocate memory Failed to allocate when using KenLM build_binary

I have a arpa file which I created by the following command: ./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa Now I want to convert this arpa file to binary file: ./build_binary 100m.arpa 100m.bin ...

user3668129

4,880

asked Mar 20, 2023 at 14:35

0 votes

0 answers

1k views

Inferring a large language model on a GPU with not enough video RAM

I'm trying some experiments running downloaded language models on a desktop machine. Specifically so far Bloom 3B and 7B on a machine with 32GB RAM, a 2-core CPU and no GPU. (Throughout this question, ...

rwallace

34.1k

asked Mar 15, 2023 at 3:14

2 votes

0 answers

2k views

forward() got an unexpected keyword argument 'labels'

I am trying to use fine-tune TransformerXL for language modeling. from transformers import TransfoXLTokenizer, TransfoXLModel tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103&...

elenata24

21

asked Mar 11, 2023 at 22:37

-1 votes

1 answer

903 views

I want to make an AI text classifier using OpenAI API, based on GPT2 but i cannot find the API documentation for the GPT2

I wanted to create an AI text classifier project for my college, I wanted to use GPT2 API for the same as it is more reliable to catch the content generated by GPT 3.5, so how can I use GPT2 ...

MinionMatrix

9

asked Mar 7, 2023 at 15:33

0 votes

0 answers

161 views

Supervised fine tuning in pre-trained language model

Supervised find turning adds a extra output layer to the pre-trained model. Does this extra layer alter the probability of words that are not related to the fine tune data?

Chen APD

1

asked Feb 23, 2023 at 17:26

0 votes

1 answer

458 views

How to use language model for speech recognition

I am working with a end to emd speech recognition system. i have language model for a language in .lm extension a and other inference and pronunciation models.I want it to make prediction from that ...

Voleti Nagendra kumar

31

asked Feb 22, 2023 at 6:05

1 vote

1 answer

333 views

When using OPT-2.7B or any other natural language model, is there a way to trick it into having a conversation/ give it a pre prompt in the code

Using this code, or a variant of, is there anything that can be added to "trick" opt into conversing as another user in a style more similar to a chatbot. As of now it will either start ...

Delta Adams

11

asked Dec 20, 2022 at 21:30

3 votes

1 answer

3k views

Forcing transformer models to generate only some tokens from a vocab

I trained a language model (encoder-decoder) to generate text. I want to restrict the generation vocab of this model to a specific vocab. How can I do that? I found in generate (model.generate) ...

Minions

5,537

asked Sep 30, 2022 at 22:21

0 votes

1 answer

111 views

How bert [cls] can collect the relevant information from the rest of the hidden states

How bert [cls] can collect the relevant information from the rest of the hidden states.??. Does [cls] has mlm information? If i train my bert using only mlm, in this case cls works?

kowser66

185

asked Sep 29, 2022 at 6:22

-1 votes

1 answer

329 views

Clustering Lists of Words (Python)

I have 54 lists consisting of words of varying lengths. For example: 1 = ["fly", "robot", "ketchup"]. 2 = ["rain", "fly", "top", "...

Jule

1

asked Sep 7, 2022 at 15:00

1 vote

0 answers

420 views

How to understand the bias term in language model head (when we tie the word embeddings)?

I was learning the masked language modeling codebase in Huggingface Transformers. Just a question to understand the language model head. Here at the final linear layer where we project hidden size to ...

Allan-J

375

asked Aug 29, 2022 at 6:33

1 vote

0 answers

489 views

NAN values appears when including a new padding token in my tokenizer

I'm trying to fine-tune a DialoGPT model on a new dataset. I already processed my data correctly and adding a new padding token in the tokenizer didn't seem to make any issue : #my dataset : print(...

Tessan

129

asked Aug 12, 2022 at 14:05

-1 votes

2 answers

760 views

How to get token or code embedding using Codex API?

For a given code snippet, how to get embedding using the Codex API? import os import openai import config openai.api_key = config.OPENAI_API_KEY def runSomeCode(): response = openai.Completion....

Exploring

3,683

asked Jul 14, 2022 at 21:16

1 vote

0 answers

102 views

Arguments of OpenIE for extracting fewer event triples

I'm new to NLP and I'm trying to using OpenIE to extract event triples from texts. I looked into its documents but quite don't understand its arguments. For example, max_entailments_per_clause ...

R__

87

asked Jul 4, 2022 at 4:36

0 votes

1 answer

6k views

How does BERT loss function works?

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions ...

kowser66

185

asked Jun 16, 2022 at 5:36

0 votes

0 answers

191 views

Pre-trained Language Models: Parameters, data, method?

I am doing a research on pre-trained LMs, specifically the following LMs: BERT ALBERT RoBERTa XLNet DistilBERT BigBird ConvBERT I am looking for information to compare these LMs like: number of ...

Othman El houfi

53

asked Jun 9, 2022 at 15:24

1 vote

0 answers

531 views

How to force GPT2 to generate specific tokens in each sentence?

My input is a string and the outputs are vector representations (corresponding to the generated tokens). I'm trying to force the outputs to have specific tokens (e.g., 4 commas/2 of the word "to&...

Penguin

2,651

asked Jun 2, 2022 at 16:07

0 votes

1 answer

425 views

OOM while fine-tuning medium sized model with DialoGPT on colab

I am trying to finetune DialoGPT with a medium-sized model, I am getting Cuda error while the training phase, I reduced the batch size from 4, but still, the error persists. My parameters are #...

Sap BH

91

asked Jun 1, 2022 at 20:20

Collectives™ on Stack Overflow