4

I made a python script and I would like to transform it into a module to use all the treatment I did on a text in other tasks.

My script I'm trying to transform into a module:**

mymodule.py

import re
import pandas as pd
import numpy as np

df = pd.DataFrame({'A': 'Hello ONE /TeSt bar FOO bARR foo Bar'.split(),
                   'B': 'one one two three two two one three'.split()})

###Function 1####
def lower_text(token):
    token = token.lower()
    return token

df['A'] = df.A.apply(lambda x: lower_text(x))

###Function 2###
def punct(token):
    token = re.sub(r'[^\w\s]',' ', token) 
    return token

df['A'] = df.A.apply(lambda x: punct(x))

###Replace###
df["A"] = df["A"].replace('foo', 'fuzzy', regex=True)

###Function that must return the final data, with all functions applied###
def data_clean():
    return df

if __name__ == '__main__':
    data_clean()

I would like to use the data resulting from this script in other tasks, so I thought about turning this script into a module. So I could import it, with the data processed. But I don't know how to do it...

Exemple:

import mymodule

###Trying to print the preprocessed data###
data = data_clean()

###tasks like LDA, ngrams, visualization...###
...

Error:

NameError                                 Traceback (most recent call last)
<ipython-input-2-c046afd3b89d> in <module>
----> 1 data = data_clean()

NameError: name 'data_clean' is not defined
4
  • 4
    If you import with import mymodule you have to specify mymodule.data_clean(). Also you shouldn't leave code outside of a function in your module. Putting everything in your data_clean function seems to be a cleaner way. Commented Sep 16, 2020 at 10:00
  • @Jao Great! It works! Thank you, too bad I can't choose your answer Commented Sep 16, 2020 at 10:08
  • 1
    I'll post it as an answer for futur users. Commented Sep 16, 2020 at 12:42
  • 1
    similar question: Python Setuptools: quick way to add scripts without "main" function as "console_scripts" entry points Commented Dec 2, 2023 at 16:55

1 Answer 1

1

If you import with import mymodule you have to specify mymodule.data_clean(). Also you shouldn't leave code outside of a function in your module. Putting everything in your data_clean function seems to be a cleaner way

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.