Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
583 views

I'm trying to run a Python script that uses the ollama library to generate responses from a custom LLM model. My code attempts to call ollama.generate() using the following model name: chosen_model = '...
JaS's user avatar
  • 45
0 votes
1 answer
94 views

I am trying to customise language model but face the error when exporting. I created a project and copied example code from Apple: import Speech class Data { func export() async throws { ...
Goran's user avatar
  • 1
2 votes
1 answer
259 views

I'm training a GPT2LMHeadModel in Python using huggingface's transformers library. The task is next token prediction. If I understand correctly, if this object is provided a labels argument, it should ...
user12138762's user avatar
0 votes
1 answer
986 views

This is a cross-post from Issue #1245 of DSPy GitHub Repo. There were no responses in the past week, am I am working on a project with a tight schedule. When running a DSPy module with a given ...
Tom Lin's user avatar
  • 110
0 votes
1 answer
529 views

I have been trying to implement Peephole LSTM using Tensorflow, and I am getting the error below Error below is my model and I am not sure why I cant get the input layer in my model summary Model and ...
Ramin sh's user avatar
1 vote
0 answers
420 views

I am trying to use a small language model in my jupyter notebook and am not able to find a working solution. I want to use the quantized version of Phi-3-mini as that is small enough to fit on my GPU ...
Christoph's user avatar
0 votes
2 answers
424 views

I've fine-tuned the Mistral 7B model using a Georgian dataset with approximately 100,000 articles, including custom tokenizer fine-tuning. The fine-tuning process took about 9 hours. However, when I ...
SabaKhupenia's user avatar
0 votes
1 answer
754 views

What are the differences between fairseq and fairseq2? Quotes from the github pages are not very clear Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train ...
Long's user avatar
  • 1,845
2 votes
1 answer
700 views

Has someone tried the last update of GCP TextEmbeddingInput that allows to specify the task_type of your application? Theoretically it should allows you to use different fine tuned models to generate ...
Asia Salpa's user avatar
1 vote
1 answer
806 views

In NLP when we use Laplace(Add-one) smoothing technique we assume that the every word is seen one more time than the actual count and the formula is like this where V is the size of the vocabulary. ...
hxdshell's user avatar
0 votes
0 answers
303 views

I try to implement sequence model (trained to predict next word) built on one-hot encoded vector sequences. My custom one-hot encoder works well. But just as exercise I want to do all things with ...
x3mEr's user avatar
  • 33
0 votes
1 answer
631 views

I'm running: #original training script trainer = transformers.Trainer( model=model, train_dataset=train_dataset, eval_dataset=test_dataset, #turn on the eval dataset for comparisons ...
Ronan McGovern's user avatar
2 votes
1 answer
643 views

I am trying to run GPT4All's embedding model on my M1 Macbook with the following code: import json import numpy as np from gpt4all import GPT4All, Embed4All # Load the cleaned JSON data with open('...
user20140267's user avatar
1 vote
1 answer
563 views

I am looking for a method to extract only the core text of a scientific paper. The paper is structured in paragraphs and I only want to cover the text without any mail-adress, websites, tables or ...
Enes Kayacan's user avatar
0 votes
1 answer
2k views

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["...
Penguin's user avatar
  • 2,651
0 votes
1 answer
822 views

I have a GPT model model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device) When I send my batch to it I can get the logits and the hidden states: out = model(batch["...
Penguin's user avatar
  • 2,651
0 votes
1 answer
1k views

I have biomedical text that I'm trying to get the embeddings for using a biomedical transformer: my_text = ["Chocolate has a history of human consumption tracing back to 400 AD and is rich in ...
Penguin's user avatar
  • 2,651
1 vote
1 answer
337 views

I am trying to install lmql[hf] using the pip package manager in order to set up a local LMQL playground. Following the documentation, I ran the command pip install lmql[hf]. However, I encountered ...
Pavel's user avatar
  • 11
4 votes
1 answer
5k views

I'm just getting started with working with LLMs, particularly OpenAIs and other OSS models. There are a lot of guides on using LlamaIndex to create a store of all your documents and then query on them....
Curunir The Colorful's user avatar
3 votes
2 answers
4k views

# Formatting block_size = 128 # or any number suitable to your context def group_texts(examples): # Concatenate all 'input_ids' concatenated_examples = sum(examples["input_ids"], [])...
Nischal 's user avatar
28 votes
4 answers
28k views

What is the difference between instruction tuning and normal fine-tuning for large language models? Also the instruction-tuning I'm referring to isn't the in-context/prompt one. All the recent papers ...
Flo's user avatar
  • 361
2 votes
1 answer
1k views

import sagemaker import boto3 from sagemaker.huggingface import HuggingFace try: role = sagemaker.get_execution_role() except ValueError: iam = boto3.client('iam') role = iam.get_role(...
Tom Bomer's user avatar
  • 113
7 votes
1 answer
744 views

I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance. It's says in the documentation that for training the model, they used 512 Tesla A100 ...
Aadesh Kulkarni's user avatar
3 votes
1 answer
9k views

Objective My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with ...
Tom Bomer's user avatar
  • 113
1 vote
2 answers
9k views

I am currently running a QA model using load_qa_with_sources_chain(). However, when I run it with three chunks of each up to 10,000 tokens, it takes about 35s to return an answer. I would like to ...
derlunter's user avatar
1 vote
1 answer
845 views

I'm following Huggingface doc on calculating the perplexity of fixed-length models. I'm trying to verify that the formula works for various strings and I'm getting odd behavior. In particular, they ...
Penguin's user avatar
  • 2,651
0 votes
1 answer
299 views

I'm trying to denoise text using a T5 model following the Huggingface doc: from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("t5-small")...
Penguin's user avatar
  • 2,651
5 votes
1 answer
3k views

I'm implementing a transformer and I have everything working, including attention using the new scaled_dot_product_attention from PyTorch 2.0. I'll only be doing causal attention, however, so it seems ...
turboderp's user avatar
0 votes
0 answers
129 views

I am trying to make a simple text generator using the Bulgarian language but my code is stuck in an endless loop. Here is the code: from tokenization import tokenize_bulgarian_text from nltk import ...
mark-de's user avatar
1 vote
0 answers
486 views

I am trying to implement the chapter 10 of NLP with transformers by lewis tunstall book. I am facing an error in this particular cell : from transformers.optimization import get_scheduler ...
Bhupinder singh's user avatar
3 votes
2 answers
3k views

Is it possible to finetune a much smaller language model like Roberta on say, a customer service dataset and get results as good as one might get with prompting GPT-4 with parts of the dataset? Can a ...
Tolu's user avatar
  • 1,167
3 votes
0 answers
2k views

In Langchain, what is the suggested way to build a chatbot with memory and retrieval from a vector embedding database at the same time? The examples in the docs add memory modules to chains that do ...
Rexcirus's user avatar
  • 3,007
1 vote
1 answer
469 views

I have a arpa file which I created by the following command: ./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa Now I want to convert this arpa file to binary file: ./build_binary 100m.arpa 100m.bin ...
user3668129's user avatar
  • 4,880
0 votes
0 answers
1k views

I'm trying some experiments running downloaded language models on a desktop machine. Specifically so far Bloom 3B and 7B on a machine with 32GB RAM, a 2-core CPU and no GPU. (Throughout this question, ...
rwallace's user avatar
  • 34.1k
2 votes
0 answers
2k views

I am trying to use fine-tune TransformerXL for language modeling. from transformers import TransfoXLTokenizer, TransfoXLModel tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103&...
elenata24's user avatar
-1 votes
1 answer
903 views

I wanted to create an AI text classifier project for my college, I wanted to use GPT2 API for the same as it is more reliable to catch the content generated by GPT 3.5, so how can I use GPT2 ...
MinionMatrix's user avatar
0 votes
0 answers
161 views

Supervised find turning adds a extra output layer to the pre-trained model. Does this extra layer alter the probability of words that are not related to the fine tune data?
Chen APD's user avatar
0 votes
1 answer
458 views

I am working with a end to emd speech recognition system. i have language model for a language in .lm extension a and other inference and pronunciation models.I want it to make prediction from that ...
Voleti Nagendra kumar's user avatar
1 vote
1 answer
333 views

Using this code, or a variant of, is there anything that can be added to "trick" opt into conversing as another user in a style more similar to a chatbot. As of now it will either start ...
Delta Adams's user avatar
3 votes
1 answer
3k views

I trained a language model (encoder-decoder) to generate text. I want to restrict the generation vocab of this model to a specific vocab. How can I do that? I found in generate (model.generate) ...
Minions's user avatar
  • 5,537
0 votes
1 answer
111 views

How bert [cls] can collect the relevant information from the rest of the hidden states.??. Does [cls] has mlm information? If i train my bert using only mlm, in this case cls works?
kowser66's user avatar
  • 185
-1 votes
1 answer
329 views

I have 54 lists consisting of words of varying lengths. For example: 1 = ["fly", "robot", "ketchup"]. 2 = ["rain", "fly", "top", "...
Jule's user avatar
  • 1
1 vote
0 answers
420 views

I was learning the masked language modeling codebase in Huggingface Transformers. Just a question to understand the language model head. Here at the final linear layer where we project hidden size to ...
Allan-J's user avatar
  • 375
1 vote
0 answers
489 views

I'm trying to fine-tune a DialoGPT model on a new dataset. I already processed my data correctly and adding a new padding token in the tokenizer didn't seem to make any issue : #my dataset : print(...
Tessan's user avatar
  • 129
-1 votes
2 answers
760 views

For a given code snippet, how to get embedding using the Codex API? import os import openai import config openai.api_key = config.OPENAI_API_KEY def runSomeCode(): response = openai.Completion....
Exploring's user avatar
  • 3,683
1 vote
0 answers
102 views

I'm new to NLP and I'm trying to using OpenIE to extract event triples from texts. I looked into its documents but quite don't understand its arguments. For example, max_entailments_per_clause ...
R__'s user avatar
  • 87
0 votes
1 answer
6k views

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions ...
kowser66's user avatar
  • 185
0 votes
0 answers
191 views

I am doing a research on pre-trained LMs, specifically the following LMs: BERT ALBERT RoBERTa XLNet DistilBERT BigBird ConvBERT I am looking for information to compare these LMs like: number of ...
Othman El houfi's user avatar
1 vote
0 answers
531 views

My input is a string and the outputs are vector representations (corresponding to the generated tokens). I'm trying to force the outputs to have specific tokens (e.g., 4 commas/2 of the word "to&...
Penguin's user avatar
  • 2,651
0 votes
1 answer
425 views

I am trying to finetune DialoGPT with a medium-sized model, I am getting Cuda error while the training phase, I reduced the batch size from 4, but still, the error persists. My parameters are #...
Sap BH's user avatar
  • 91

1
2 3 4 5 6