4

I have a special kind of problem. I am able to run the code in jupyter notebook perfectly fine with no OOM error. However when i run the same code as a script in linux it gives me the OOM error. Has anyone have the same issue. I tried gc.collect() and torch.cuda.empty_cache() inside the code and nothing helps.

It always gives me this error. RuntimeError: CUDA out of memory. Tried to allocate 1.30 GiB (GPU 0; 7.79 GiB total capacity; 4.80 GiB already allocated; 922.69 MiB free; 6.12 GiB reserved in total by PyTorch

The code:

def lemmatize(phrase):
    """Return lematized words"""
    spa = spacy.load("en_core_web_sm")
    return " ".join([word.lemma_ for word in spa(phrase)])

def reading_csv(path_to_csv):
    """Return text column in csv"""
    data = pd.read_csv(path_to_csv)
    ctx_paragraph = []
    for txt in data['text']:
        if not pd.isna(txt):
            ctx_paragraph.append(txt)
    return ctx_paragraph

def processing_question(ques, paragraphs, domain_lemma_cache, domain_pickle):
    """Return answer"""
    #Lemmatizing whole csv text column
    lemma_cache = domain_lemma_cache
    if not os.path.isfile(lemma_cache):
        lemmas = [lemmatize(par) for par in tqdm(paragraphs)]
        df = pd.DataFrame(data={'context': paragraphs, 'lemmas': lemmas})
        df.to_feather(lemma_cache)
    df = pd.read_feather(lemma_cache)
    paragraphs = df.context
    lemmas = df.lemmas
    #Vectorizor cache
    if not os.path.isfile(VEC_PICKLE_LOC):
        vectorizer = TfidfVectorizer(
            stop_words='english', min_df=5, max_df=.5, ngram_range=(1, 3))
        vectorizer.fit_transform(lemmas)
        pickle.dump(vectorizer, open(VEC_PICKLE_LOC, "wb"))
    #Vectorized lemmas cache cache
    if not os.path.isfile(domain_pickle):
        tfidf = vectorizer.fit_transform(lemmas)
        pickle.dump(tfidf, open(domain_pickle, "wb"))
    vectorizer = pickle.load(open(VEC_PICKLE_LOC, "rb"))
    tfidf = pickle.load(open(domain_pickle, "rb"))
    question = ques
    query = vectorizer.transform([lemmatize(question)])
    (query > 0).sum(), vectorizer.inverse_transform(query)
    scores = (tfidf * query.T).toarray()
    results = (np.flip(np.argsort(scores, axis=0)))
    qapipe = pipeline('question-answering',
                      model='distilbert-base-uncased-distilled-squad',
                      tokenizer='bert-base-uncased',
                      device=0)
    candidate_idxs = [(i, scores[i]) for i in results[0:10, 0]]
    contexts = [(paragraphs[i], s) for (i, s) in candidate_idxs if s > 0.01]
    question_df = pd.DataFrame.from_records([{
        'question': question,
        'context':  ctx
    } for (ctx, s) in contexts])
    preds = qapipe(question_df.to_dict(orient="records"))
    answer_df = pd.DataFrame.from_records(preds)
    answer_df["context"] = question_df["context"]
    answer_df = answer_df.sort_values(by="score", ascending=False)
    return answer_df

2
  • 1
    Did you shut down the Jupyter notebook? Jupyter hangs on to a lot of variables and if they are tensors on the GPU, that memory won't be freed until you shut the kernel down. Commented May 21, 2020 at 23:09
  • Yes, everything is shutdown. I even try restarting the whole computer and run the script straight away after it is up Commented May 22, 2020 at 7:28

1 Answer 1

1

I had a similar thing happen to me recently.

I would run my model in a Jupyter notebook, on a AWS EC2 p2.xlarge instance, and the model would run correctly. Then, I would ssh into the same instance, and re-run a .py script of the same model, and receive the OOM errors that you described.

All I had to do was reset the kernal of the Jupyter notebook, to get the .py script to work.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.