Use pandas groupby function on multiple columns

Question

I have a DataFrame similar to this:

Key    Departure    Species1   Species2   Status
1         R          Carlan     Carlan      D
1         R          Scival     Carex       C
2         R          Carlan     Scival      D
2         R          Scival     Bougra      C  
3         D          Carlan     Carlan      D
3         D          Scival     Scival      C

I want to count the occurrences of each unique Species1 for a given Departure and Status of D of C

My desired output is:

Species1   RD    RC    DD    DC
Carlan     2     NaN   1     NaN
Scival     NaN   2     NaN   1

Just a suggesition, you might as well remove the bit about your function f(x) because you are not showing how you implement it. — dermen
– dermen, Commented Jul 21, 2015 at 19:25

dermen · Accepted Answer · 2015-07-21 20:51:05Z

Make a new column that is the combination of Departure and Status

df['comb'] = df.Departure + df.Status
df
#  Key Departure Species1 Species2 Status comb
#0   1         R   Carlan   Carlan      D   RD
#1   1         R   Scival    Carex      C   RC
#2   2         R   Carlan   Scival      D   RD
#3   2         R   Scival   Bougra      C   RC
#4   3         D   Carlan   Carlan      D   DD
#5   3         D   Scival   Scival      C   DC

Then you can groupby:

gb    = df.groupby(['Species1', 'comb'])
gb.groups
#{('Carlan', 'DD'): [4],
#('Carlan', 'RD'): [0, 2],
#('Scival', 'DC'): [5],
#('Scival', 'RC'): [1, 3]}

Now organize the results into a list, where each element is a tuple (column, Series(data, index)) representing a single data point in a new dataframe

items = [ (key[1], pandas.Series( [len(val)], index=[key[0]] ) )for key,val in gb.groups.items() ]

And make a new dataframe from the items:

result = pandas.from_items( items)
result
#        RC  DC  DD  RD
#Carlan NaN NaN   1   2
#Scival   2   1 NaN NaN

Extra info

See this link for ideas on crating new dataframes from various objects. When you want to create a dataframe from individual data points (e.g. (Species1,comb) ), then from_items is your best option.

dagrha · Accepted Answer · 2015-08-12 03:33:33Z

2

Use the pandas.crosstab() method. A single line of code:

pd.crosstab(df.Species1, [df.Departure, df.Status])

The resulting table:

If you combine with @dermen's 'comb' column,

df['comb'] = df.Departure + df.Status
pd.crosstab(df.Species1, df.comb)

you'll get:

If you really want those 'NaN', just tack on a .replace('0', np.nan), like so (assuming an import numpy as np has already been done):

pd.crosstab(df.Species1, df.comb).replace('0', np.nan)

edited Aug 12, 2015 at 3:33

answered Jul 21, 2015 at 18:54

dagrha

2,5791 gold badge22 silver badges21 bronze badges

4 Comments

Stefano Potter Over a year ago

thats close enough, and I can rework it into what I need from there, thank! So simple...

dagrha Over a year ago

@StefanoPotter-- Take a look at my edit. I think it is way better than my my previous answer, and it gets you what you want with just a couple very simple lines of code.

Stefano Potter Over a year ago

very interesting, never used crosstab before. Thanks for posting!

dermen Over a year ago

nice. i will point out that numpy is by deafult imported with pandas, just try np = pandas.np

Gman · Accepted Answer · 2015-07-22 00:46:50Z

0

you can use the a groupby query on multiple colums and and use the.agg function to count the number of occurences:

df.groupby(['Species1', 'Departure', 'Status']).agg(['count'])

answered Jul 22, 2015 at 0:46

Gman

1342 silver badges9 bronze badges

Collectives™ on Stack Overflow

Use pandas groupby function on multiple columns

3 Answers 3

Extra info

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Extra info

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related