2

I have this dataframe:

  iata               airport              city state country        lat  \
0  00M              Thigpen        Bay Springs    MS     USA  31.953765   
1  00R  Livingston Municipal        Livingston    TX     USA  30.685861   
2  00V           Meadow Lake  Colorado Springs    CO     USA  38.945749   
3  01G          Perry-Warsaw             Perry    NY     USA  42.741347   
4  01J      Hilliard Airpark          Hilliard    FL     USA  30.688012 

I am trying to get the number of airports per state. For example if I have the function:

f(dataframe, state):
    result reuslt

Where state would be a state abbreviation, such as 'MA'. I am trying to group the dataframe by the input variable, such as state ('MA') to then get the number of airports per state.

When I use:

df.groupby(state)['airport'].value_counts()

or

df.groupby(state)['airport'].value_counts()/df['airport'].count()


df.groupby(['state'] == state)['airport'].value_counts()/df['airport'].count()

The last two are regarding the conditional probability a selected airport will be in that state.

It throws a Key Error: 'MA', which I think is due to the input variable not being recognized as a column, but a value in the column.

Is there a way to get the number of airports per state?

0

4 Answers 4

2

I would use Pandas's nunique to get the number of airports per state. The code is easier to read and remember.

To illustrate my point, I modified the dataset as follows, such that Florida has three more fictional airports:

iata airport city state country lat 0 00M Thigpen Bay Springs MS USA 31.953765 1 00R Livingston Municipal Livingston TX USA 30.685861 2 00V Meadow Lake Springs CO USA 38.945749 3 01G Perry-Warsaw Perry NY USA 42.741347 4 01J Hilliard Airpark Hilliard FL USA 30.688012 5 f234 Weirdviller Chilliard FL USA 30.788012 6 23r2 Johnson Billiard FL USA 30.888012

Then, we write:

df.groupby('state').iata.nunique() to get the following results:

state CO 1 MS 1 TX 1 FL 3 NY 1 Name: iata, dtype: int64

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

1

Assuming each record is an airport throughout, you can just count the records for each state / country combination:

df.groupby(['country','state']).size()

Comments

0

You can rewrite this as an explicit groupby apply:

In [11]: df.groupby("state")["airport"].apply(lambda x: x.value_counts() / len(x))
Out[11]:
state
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

or store the groupby and reuse it (probably this is faster):

In [21]: g = df.groupby("state")["airport"]

In [22]: g.value_counts() / g.size()
Out[22]:
state  airport
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

1 Comment

of course, all the conditional probabilities are 1.0 in this example (as there is only one airport in each state).
0

This seemed to work the way I intended with all your help. a[state] represents an input in the form of a state abbreviation ('MA'). This returns the probability of a randomly selected airport belonging to that state.

a = df.groupby('state').iata.nunique()
s = a.sum()
result = a[state]/s
return result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.