2

I'm attempting to follow and work through this video tutorial by Wes McKinney. I'm to the point where we're going through the baby names examples, and I'm having the same problems both in my code I'm writing and with his code (BabyNames.ipynb).

For reference, I'm on a Mac (OS X 10.10.1) using:

  • Python 2.7.6
  • IPython 2.3.1
  • Pandas 0.15.2

I can successfully do all of this:

names = read_csv('baby-names2.csv')   # read the data in
boys = names[names.sex == 'boy']      # create boys list
girls = names[names.sex == 'girl']    # create girls list

# create a function
def get_quantile_count(group, quantile=0.5):
    df = group.sort_index(by='prop', ascending=False)
    return df.prop.cumsum().searchsorted(quantile)

# call the function 
boys.groupby('year').apply(get_quantile_count)

This gives me output that looks like this (for brevity, only showing a small section of the data):

year
1880    [15]
1881    [15]
1882    [17]
1883    [17]
1884    [19]
1885    [20]
1886    [20]
1887    [21]
1888    [22]
1889    [22]
1890    [23]
1891    [24]
1892    [25]

I want to then plot this data, like this:

boys.groupby('year').apply(get_quantile_count).plot()

but it's giving me this error:

TypeError: Empty 'Series': no numeric data to plot

In watching the video, the data he shows does not have the square brackets [ ] around the numbers in the data frame. I'm guessing this is what's causing my problems.

Anyone have any tricks on how to change this? I was watching the video and writing the code myself, but the same thing happens if I run the provided notebook BabyNames.ipynb.

2 Answers 2

1

So it seems like I posted this question too early. I stepped away from it for a bit then realized this was an easy fix.

The issue was the function searchsorted() was giving me an array back, and I only needed the single item in the array. Easy enough. Modified the function to be this:

# create a function
def get_quantile_count(group, quantile=0.5):
    df = group.sort_index(by='prop', ascending=False)
    return df.prop.cumsum().searchsorted(quantile)[0]

Just used the index 0 to get the number out of the array. Don't know why I was having such a hard time with this. I guess that function must have changed its return type in the recent past? Or is there some option I have that's set incorrectly? Don't know, but at least this fixes it.

Sign up to request clarification or add additional context in comments.

Comments

1

I had a similar problem, used .astype(float) to solve the issue, but your way might be better.

boys.groupby('year').apply(get_quantile_count).astype(float).plot()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.