0

The following script prints the same input variable input_df twice at the end - before and after df_lower has been called:

import pandas as pd

def df_lower(df):
    cols = ['col_1']
    df[cols] = df[cols].applymap(lambda x: x.lower())
    return df

input_df = pd.DataFrame({
    'col_1': ['ABC'],
    'col_2': ['XYZ']
})

print(input_df)
processed_df = df_lower(input_df)
print(input_df)

The output shows that input_df changes:

  col_1 col_2
0   ABC   XYZ
  col_1 col_2
0   abc   XYZ

Why is input_df modified?

Why isn't it modified when full input_df (no column indexing) is processed?

def df_lower_no_indexing(df):
    df = df.applymap(lambda x: x.lower())
    return df
1
  • Because by taking the input_df into the function and using df[cols] = blabla, you are making the new variable and the old , point to the same place in memory Commented Jul 29, 2019 at 15:01

1 Answer 1

1

You are assinging to a slice of the input dataframe. In the no indexing case, you are just assigning a new value to the local variable df:

df = df.applymap(lambda x: x.lower())

Which creates a new variable, leaving the input as is.

Conversely, in the first case, you are assigning a value to a slice of the input, hence, modifying the input itself:

df[cols] = df[cols].applymap(lambda x: x.lower())

With a simple change, you can create a new variable as well in the first case:

def df_lower(df):
    cols = ['col_1']
    df = df[[col for col in df.columns if col not in cols]]
    df[cols] = df[cols].applymap(lambda x: x.lower())
    return df
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.