Data frame as Global Variable inside each function

Question

I have a dataframe as df, i want to split my activities into different functions so that i can use those functions into future programs

# check if dataframe has duplicates
    def duplicate_check ():
        global df
        df = df.drop_duplicates(['datetime', 'tagname'])
        df.drop(['tagname'], axis=1, inplace=True)
        return df

    df = duplicate_check()

# Split my dataframe array column to individual column
    def array_split():
        global df
        date = df['datetime']
        df = df['value'] \
            .str.split('\t', expand=True).fillna('0') \
            .replace(r'\s+|\\n', ' ', regex=True) \
            .apply(pd.to_numeric)
        df['datetime'] = date  # Join date back to dataframe
        return df

    df = array_split()

# split dataframe df to df and df_spec 
    def remove_duplicate_spec():
        global df, df_spec
        df_spec = df.loc[df[123].isin([1])]
        df = df.loc[df[123].isin([0])]
        df_spec = df_spec.drop_duplicates(119)
        return df, df_spec


    df, df_spec = remove_duplicate_spec()

Question: Should i declare global df/ df_spec inside each function? Is this the best practice? or how can I optimize the code further

René · Accepted Answer · 2022-07-20 04:37:12Z

2

The best way is to use your dataframe as argument for each function.

df = pd.DataFrame({'datetime':[0,0,1,1,2], 'tagname':[0,0,1,1,2], 'other':range(95,100)})

def duplicate_check(df):
    return df.drop_duplicates(['datetime', 'tagname'], keep='last').drop(['tagname'], axis=1)

duplicate_check(df)

DataFrame:

   datetime  tagname  other
0         0        0     95
1         0        0     96
2         1        1     97
3         1        1     98
4         2        2     99

Result of duplicate_check(df):

   datetime  other
1         0     96
3         1     98
4         2     99

edited Jul 20, 2022 at 4:37

answered Jul 19, 2022 at 10:05

René

4,9195 gold badges29 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user_v27 Over a year ago

File "<ipython-input-6-6e07979b163d>", line 10, in <cell line: 10> df = duplicate_check() TypeError: duplicate_check() missing 1 required positional argument: 'df'

user_v27 Over a year ago

if i pass df inside def duplicate_check (df): then i get above error

René Over a year ago

I edited my answer, hope this works for you. Use: duplicate_check(df)

user_v27 Over a year ago

Thank you, how could we do it in case of 3rd function which has 2 (df, df_spec) in one return

user_v27 Over a year ago

or in the case of 2nd function where there is date variable.

|

Collectives™ on Stack Overflow

Data frame as Global Variable inside each function

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related