2

Is it possible to use in groupby a user-defined function that will be passed as arguments values of several columns, every column in a separate argument? In the following 'standard' example sum function is called onv1 and v2 columns separately:

In [110]: dct = {
     ...:     'id':[1,2,2,3,3,3],
     ...:     'vl':[1,1,1,1,1,1],
     ...:     'v2':[2,2,2,2,2,2]
     ...: }
     ...:
     ...: df = pd.DataFrame(dct)
     ...: df.groupby('id')['vl','v2'].sum()
     ...:
Out[110]:
vl  v2
id
1    1   2
2    2   4
3    3   6

How to define mysum function with two arguments, with each argument to get it's own column something like:

def f(col1, col2):                                                                                          
    return  col1 * 2 + col2 * 3

So, in fact, this function merges two columns in one. Can this be done?

0

2 Answers 2

1

you can unpack a pd.Series with * or ** depending on what you need. Or you can be very explicit with your lambda.

def f(v1, v2):                                                                                          
    return  v1 * 2 + v2 * 3

df[['v1', 'v2']].apply(lambda x: f(*x), 1)
# or
df[['v1', 'v2']].apply(lambda x: f(**x), 1)
# or
df.apply(lambda x: f(x.v1, x.v2), 1)

0    8
1    8
2    8
3    8
4    8
5    8
dtype: int64
Sign up to request clarification or add additional context in comments.

2 Comments

What does 1 mean in ` f(*x), 1` ?
I ran apply on the df directly as opposed to after a groupby. When doing it directly, I need to specify the axis in which I'm applying. In this case, the axis was 1.
0

You can convert the group to a numpy array by accessing the .values property, then do the sum, as for numpy.sum, The default axis=None, will sum all of the elements of the input array.:

df.groupby('id')['vl','v2'].apply(lambda g: g.values.sum())

#id
#1    3
#2    6
#3    9
#dtype: int64

To get weighted sum:

df.groupby('id')['vl','v2'].apply(lambda g: (g.v1 * 2 + g.v2 * 3).sum())

#id
#1     8
#2    16
#3    24
#dtype: int64

6 Comments

Running this I get TypeError: Series.name must be a hashable type
Which version of python and pandas are you running, I got no error on pandas 0.19.x with both python 2 and 3.
Python 2.7.12 |Anaconda 4.2.0 (x86_64)| (default, Jul 2 2016, 17:43:17) ---- IPython 5.1.0
You may also check the key of the dictionary. It doesn't seem like the keys match the column names you are using in the groupby line.
Also I don't actually need to sum all elements, it may be any manipulation of arguments, for example col1 * 2 + col2 * 3
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.