Pandas groupby using function variable

Question

I have this dataframe:

  iata               airport              city state country        lat  \
0  00M              Thigpen        Bay Springs    MS     USA  31.953765   
1  00R  Livingston Municipal        Livingston    TX     USA  30.685861   
2  00V           Meadow Lake  Colorado Springs    CO     USA  38.945749   
3  01G          Perry-Warsaw             Perry    NY     USA  42.741347   
4  01J      Hilliard Airpark          Hilliard    FL     USA  30.688012

I am trying to get the number of airports per state. For example if I have the function:

f(dataframe, state):
    result reuslt

Where state would be a state abbreviation, such as 'MA'. I am trying to group the dataframe by the input variable, such as state ('MA') to then get the number of airports per state.

When I use:

df.groupby(state)['airport'].value_counts()

or

df.groupby(state)['airport'].value_counts()/df['airport'].count()


df.groupby(['state'] == state)['airport'].value_counts()/df['airport'].count()

The last two are regarding the conditional probability a selected airport will be in that state.

It throws a Key Error: 'MA', which I think is due to the input variable not being recognized as a column, but a value in the column.

Is there a way to get the number of airports per state?

KuboAndTwoStrings · Accepted Answer · 2017-11-02 22:12:04Z

I would use Pandas's nunique to get the number of airports per state. The code is easier to read and remember.

To illustrate my point, I modified the dataset as follows, such that Florida has three more fictional airports:

iata airport city state country lat 0 00M Thigpen Bay Springs MS USA 31.953765 1 00R Livingston Municipal Livingston TX USA 30.685861 2 00V Meadow Lake Springs CO USA 38.945749 3 01G Perry-Warsaw Perry NY USA 42.741347 4 01J Hilliard Airpark Hilliard FL USA 30.688012 5 f234 Weirdviller Chilliard FL USA 30.788012 6 23r2 Johnson Billiard FL USA 30.888012

Then, we write:

df.groupby('state').iata.nunique() to get the following results:

state CO 1 MS 1 TX 1 FL 3 NY 1 Name: iata, dtype: int64

Hope this helps.

ako · Accepted Answer · 2017-11-02 21:32:54Z

1

Assuming each record is an airport throughout, you can just count the records for each state / country combination:

df.groupby(['country','state']).size()

answered Nov 2, 2017 at 21:32

ako

3,7094 gold badges32 silver badges39 bronze badges

Comments

Andy Hayden · Accepted Answer · 2017-11-02 21:54:56Z

0

You can rewrite this as an explicit groupby apply:

In [11]: df.groupby("state")["airport"].apply(lambda x: x.value_counts() / len(x))
Out[11]:
state
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

or store the groupby and reuse it (probably this is faster):

In [21]: g = df.groupby("state")["airport"]

In [22]: g.value_counts() / g.size()
Out[22]:
state  airport
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

answered Nov 2, 2017 at 21:54

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

1 Comment

Andy Hayden Over a year ago

of course, all the conditional probabilities are 1.0 in this example (as there is only one airport in each state).

Babeeshka · Accepted Answer · 2017-11-05 18:51:53Z

0

This seemed to work the way I intended with all your help. a[state] represents an input in the form of a state abbreviation ('MA'). This returns the probability of a randomly selected airport belonging to that state.

a = df.groupby('state').iata.nunique()
s = a.sum()
result = a[state]/s
return result

answered Nov 5, 2017 at 18:51

Babeeshka

1151 gold badge5 silver badges22 bronze badges

Collectives™ on Stack Overflow

Pandas groupby using function variable

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related