Python Pandas Dataframe: user-defined function in groupby to merge columns

Question

Is it possible to use in groupby a user-defined function that will be passed as arguments values of several columns, every column in a separate argument? In the following 'standard' example sum function is called onv1 and v2 columns separately:

In [110]: dct = {
     ...:     'id':[1,2,2,3,3,3],
     ...:     'vl':[1,1,1,1,1,1],
     ...:     'v2':[2,2,2,2,2,2]
     ...: }
     ...:
     ...: df = pd.DataFrame(dct)
     ...: df.groupby('id')['vl','v2'].sum()
     ...:
Out[110]:
vl  v2
id
1    1   2
2    2   4
3    3   6

How to define mysum function with two arguments, with each argument to get it's own column something like:

def f(col1, col2):                                                                                          
    return  col1 * 2 + col2 * 3

So, in fact, this function merges two columns in one. Can this be done?

piRSquared · Accepted Answer · 2016-12-18 22:36:45Z

1

you can unpack a pd.Series with * or ** depending on what you need. Or you can be very explicit with your lambda.

def f(v1, v2):                                                                                          
    return  v1 * 2 + v2 * 3

df[['v1', 'v2']].apply(lambda x: f(*x), 1)
# or
df[['v1', 'v2']].apply(lambda x: f(**x), 1)
# or
df.apply(lambda x: f(x.v1, x.v2), 1)

0    8
1    8
2    8
3    8
4    8
5    8
dtype: int64

answered Dec 18, 2016 at 22:36

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

zork Over a year ago

What does 1 mean in ` f(*x), 1` ?

piRSquared Over a year ago

I ran apply on the df directly as opposed to after a groupby. When doing it directly, I need to specify the axis in which I'm applying. In this case, the axis was 1.

akuiper · Accepted Answer · 2016-12-18 22:45:34Z

0

You can convert the group to a numpy array by accessing the .values property, then do the sum, as for numpy.sum, The default axis=None, will sum all of the elements of the input array.:

df.groupby('id')['vl','v2'].apply(lambda g: g.values.sum())

#id
#1    3
#2    6
#3    9
#dtype: int64

To get weighted sum:

df.groupby('id')['vl','v2'].apply(lambda g: (g.v1 * 2 + g.v2 * 3).sum())

#id
#1     8
#2    16
#3    24
#dtype: int64

edited Dec 18, 2016 at 22:45

answered Dec 18, 2016 at 22:07

akuiper

216k33 gold badges363 silver badges380 bronze badges

6 Comments

zork Over a year ago

Running this I get TypeError: Series.name must be a hashable type

akuiper Over a year ago

Which version of python and pandas are you running, I got no error on pandas 0.19.x with both python 2 and 3.

zork Over a year ago

Python 2.7.12 |Anaconda 4.2.0 (x86_64)| (default, Jul 2 2016, 17:43:17) ---- IPython 5.1.0

akuiper Over a year ago

You may also check the key of the dictionary. It doesn't seem like the keys match the column names you are using in the groupby line.

zork Over a year ago

Also I don't actually need to sum all elements, it may be any manipulation of arguments, for example col1 * 2 + col2 * 3

|

Collectives™ on Stack Overflow

Python Pandas Dataframe: user-defined function in groupby to merge columns

2 Answers 2

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related