I made a python script and I would like to transform it into a module to use all the treatment I did on a text in other tasks.
My script I'm trying to transform into a module:**
mymodule.py
import re
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'Hello ONE /TeSt bar FOO bARR foo Bar'.split(),
'B': 'one one two three two two one three'.split()})
###Function 1####
def lower_text(token):
token = token.lower()
return token
df['A'] = df.A.apply(lambda x: lower_text(x))
###Function 2###
def punct(token):
token = re.sub(r'[^\w\s]',' ', token)
return token
df['A'] = df.A.apply(lambda x: punct(x))
###Replace###
df["A"] = df["A"].replace('foo', 'fuzzy', regex=True)
###Function that must return the final data, with all functions applied###
def data_clean():
return df
if __name__ == '__main__':
data_clean()
I would like to use the data resulting from this script in other tasks, so I thought about turning this script into a module. So I could import it, with the data processed. But I don't know how to do it...
Exemple:
import mymodule
###Trying to print the preprocessed data###
data = data_clean()
###tasks like LDA, ngrams, visualization...###
...
Error:
NameError Traceback (most recent call last)
<ipython-input-2-c046afd3b89d> in <module>
----> 1 data = data_clean()
NameError: name 'data_clean' is not defined
import mymoduleyou have to specifymymodule.data_clean(). Also you shouldn't leave code outside of a function in your module. Putting everything in yourdata_cleanfunction seems to be a cleaner way.