I am running an inference model on a Ubuntu machine with 8GB only and just realised the predictions (logits) are not generated in a batch way so my process is getting Killed due oom issues.
tokenized_test = tokenizer(dataset["test"]["text"], padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
logits = model(**tokenized_test).logits
This is where I run out of memory. What is the best way of do this in batches/parallelize/sequentiate/solve the oom issue. I am ultimately looking for the solution that would require the least amount of code changes.
Source
I have built my code based on this tutorial:
https://huggingface.co/docs/transformers/tasks/sequence_classification
Increasing the dataset size will eventually make you go oom to.