How do I avoid passing arguments to function in Python pandas .apply function?

Question

I am wanting to experiment with the raw=True option in the pandas apply function, as per p. 155 in High Performance Python, by Gorelick and Ozsvald. However, Python is apparently regarding the raw=True as an argument for the function I'm applying, and not for the .apply function itself! Here's a MWE:

import pandas as pd

df = pd.DataFrame(columns=('a', 'b'))
df.loc[0] = (1, 2)
df.loc[1] = (3, 4)

df['a'] = df['a'].apply(str, raw=True)

When I try to execute this, I get the following error:

TypeError: 'raw' is an invalid keyword argument for str()

The problem stays there even if I use a lambda expression:

df['a'] = df['a'].apply(lambda x: str(x), raw=True)

The problem remains if I call a custom-defined function instead of str.

How do I get Pandas to recognize that raw=True is an argument for .apply and NOT str?

I'm not sure but I think it is because you use pd.Series.apply and not pd.DataFrame.apply. Series doesn't seem to accept raw as argument. Try df.apply(str, raw=True). Is that what you are searching for ? — Rabinzel
– Rabinzel, Commented Aug 18, 2022 at 21:37
@Rabinzel Hmm. I think you've got it. The examples in the book are definitely using the df version, not the ser version. — Adrian Keister
– Adrian Keister, Commented Aug 18, 2022 at 21:39
If you still only want to apply it to column a, use double brackets, that way you pass a dataframe instead of a Series: df[['a']].apply(str, raw=True) — Rabinzel
– Rabinzel, Commented Aug 18, 2022 at 21:41
That approach does have some side effects, though: ``` a b 0 [1 3] 2 1 [1 3] 4 ``` — Adrian Keister
– Adrian Keister, Commented Aug 18, 2022 at 21:43
Hmmk, well there's nothing to be gained from using raw=True with Series, because pd.Series.apply already passes raw values. raw=True is useful for pd.DataFrame.apply because it passes numpy arrays instead, which depending on your function can improve performance. As you can see in the documentation, there is no raw=True argument for a Series. — BeRT2me
– BeRT2me, Commented Aug 18, 2022 at 23:02

Rabinzel · Accepted Answer · 2022-08-18 22:03:20Z

1

Referring to the comments, I don't think these are side effects. As in the documentation stated, passing raw=True as argument, the "function receive ndarray objects", so you pass an array and convert it to a string. The result is a string like [1 3]. So you don't convert each value to a string, instead the whole Series to a string

If you write a little helper function you can see that.

def conv(col):
    print(f"input values: {col}")
    print(f"type input: {type(col)}\n")
    return str(col)

t = df[['a']].apply(conv, raw=True)
print(f"{type(t)}:\n{t}\n")
print(f"first value: {type(t[0])}:\n{t[0]}\n")
print(f"{t[0][0]}")

Output:

input values: [1 3]
type input: <class 'numpy.ndarray'>

<class 'pandas.core.series.Series'>:
a    [1 3]
dtype: object

first value: <class 'str'>:
[1 3]

[

answered Aug 18, 2022 at 22:03

Rabinzel

7,9533 gold badges12 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Adrian Keister Over a year ago

So if I want to convert a column to string from int using apply and raw=True, what is the exact syntax I should use?

Rabinzel Over a year ago

My knowledge isn't deep enough to give advice here, but I think the answer is just: don't do it. The documentation also says "If you are just applying a NumPy reduction function this will achieve much better performance". You don't reduce anything here, so I don't think there is any advantage in using raw

Collectives™ on Stack Overflow

How do I avoid passing arguments to function in Python pandas .apply function?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related