1

I have a DataFrame similar to this:

Key    Departure    Species1   Species2   Status
1         R          Carlan     Carlan      D
1         R          Scival     Carex       C
2         R          Carlan     Scival      D
2         R          Scival     Bougra      C  
3         D          Carlan     Carlan      D
3         D          Scival     Scival      C

I want to count the occurrences of each unique Species1 for a given Departure and Status of D of C

My desired output is:

Species1   RD    RC    DD    DC
Carlan     2     NaN   1     NaN
Scival     NaN   2     NaN   1
1
  • Just a suggesition, you might as well remove the bit about your function f(x) because you are not showing how you implement it. Commented Jul 21, 2015 at 19:25

3 Answers 3

3

Make a new column that is the combination of Departure and Status

df['comb'] = df.Departure + df.Status
df
#  Key Departure Species1 Species2 Status comb
#0   1         R   Carlan   Carlan      D   RD
#1   1         R   Scival    Carex      C   RC
#2   2         R   Carlan   Scival      D   RD
#3   2         R   Scival   Bougra      C   RC
#4   3         D   Carlan   Carlan      D   DD
#5   3         D   Scival   Scival      C   DC

Then you can groupby:

gb    = df.groupby(['Species1', 'comb'])
gb.groups
#{('Carlan', 'DD'): [4],
#('Carlan', 'RD'): [0, 2],
#('Scival', 'DC'): [5],
#('Scival', 'RC'): [1, 3]}

Now organize the results into a list, where each element is a tuple (column, Series(data, index)) representing a single data point in a new dataframe

items = [ (key[1], pandas.Series( [len(val)], index=[key[0]] ) )for key,val in gb.groups.items() ]

And make a new dataframe from the items:

result = pandas.from_items( items)
result
#        RC  DC  DD  RD
#Carlan NaN NaN   1   2
#Scival   2   1 NaN NaN

Extra info

See this link for ideas on crating new dataframes from various objects. When you want to create a dataframe from individual data points (e.g. (Species1,comb) ), then from_items is your best option.

Sign up to request clarification or add additional context in comments.

Comments

2

Use the pandas.crosstab() method. A single line of code:

pd.crosstab(df.Species1, [df.Departure, df.Status])

The resulting table:

enter image description here

If you combine with @dermen's 'comb' column,

df['comb'] = df.Departure + df.Status
pd.crosstab(df.Species1, df.comb)

you'll get:

enter image description here

If you really want those 'NaN', just tack on a .replace('0', np.nan), like so (assuming an import numpy as np has already been done):

pd.crosstab(df.Species1, df.comb).replace('0', np.nan)

enter image description here

4 Comments

thats close enough, and I can rework it into what I need from there, thank! So simple...
@StefanoPotter-- Take a look at my edit. I think it is way better than my my previous answer, and it gets you what you want with just a couple very simple lines of code.
very interesting, never used crosstab before. Thanks for posting!
nice. i will point out that numpy is by deafult imported with pandas, just try np = pandas.np
0

you can use the a groupby query on multiple colums and and use the.agg function to count the number of occurences:

df.groupby(['Species1', 'Departure', 'Status']).agg(['count'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.