1

Maybe a very naive question, but I am stuck in this: pandas.DataFrame.apply has a possibility to put inside a function.

# define function for further usage    
def get_string(df):
        string_input = []
        for id, field in enumerate(df.index):
            string_input.append('<field id="{0}" column="{1}">{2}</field>'.format(id, field, df[field]))
        return '\n'.join(string_input)

If i apply it on df i get perfectly formatted string file output as wanted

global_1 = '\n'.join(df.apply(get_string, axis=1))

output:
<field id="0" column="xxx">49998.0</field>
<field id="1" column="xxx">37492.0</field>
<field id="2" column="xxx">12029.0</field>

But why don't i have to put inside get_string() necessary input global parameter df get_string(df) like this:

global_1 = '\n'.join(df.apply(get_string(df), axis=1))

and what if i have more input global parameters? I have Googled for it a while, but I am not very clear about it. Could anyone give me some illustrative explanation how it works? Thank you for any assistance.

1
  • 1
    There's really no need to use apply here. Provide a minimal reproducible example with sample data and your expected output and you'll get a better way to do this. Commented Jul 26, 2019 at 14:05

1 Answer 1

1

You are confusing between df the global variable and df the local variable.

The get_string function defines input variable called df and this will overshadow any variable of the same name from higher scopes. The df that get_string knows is the dataframe you called apply upon, not the global df. You can try it with different dataframes:

df = pd.DataFrame({'a': ['Lorem', 'Ipsum']})
x = pd.DataFrame({'b': ['Hello', 'World']})
y = pd.DataFrame({'c': ['Goodbye', 'World']})

global_1 = '\n'.join(df.apply(get_string, axis=1))
global_2 = '\n'.join(x.apply(get_string, axis=1))
global_3 = '\n'.join(y.apply(get_string, axis=1))

print(global_1)
print(global_2)
print(global_3)

Result:

# From the global `df`
<field id="0" column="a">Lorem</field>
<field id="0" column="a">Ipsum</field>
# From x
<field id="0" column="b">Hello</field>
<field id="0" column="b">World</field>
# From y
<field id="0" column="c">Goodbye</field>
<field id="0" column="c">World</field>
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for great explanation, but problem comes, when my defined function get_string() requires more than only df like get_string(df, some_list,..), how to put this inside apply function, when it is called only on df? Thank you
You pass them by keyword in the apply call: df.apply(get_string, axis=1, some_list=..., another_param=...)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.