0

I have a dataset of every player in the NBA and their stats since 1950. The columns in the dataset consist of the Year, which is the applicable year, player names and his team for that year, years in the NBA, and 20 columns of different stats for every player in every year of his career. One of the columns is 'PTS', which is the total number of points the player has scored that year. I want to create a scatter plot in Python that shows the Years 1950 through 2017 on the x-axis and the total points scored in that year on the y-axis. I believe the most efficient way to return the total points for each season is:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    seasons = pd.read_csv('C:/windows/temp/Seasons_Stats.csv')
    tp_yr = seasons.groupby('Year').agg({'PTS': ['sum']})

But creating the scatter plot using:

    tp_yr.plot.scatter( x= 'Year', y = 'PTS', s = 'None', c='red')

returns:

KeyError: 'Year'

and a blank graph I want the total points in a year for all the years from 1950 -2017 represented in a red scatter plot.

4
  • "I'm having issues" is not a sufficient problem description. Be specific: What are the columns in the dataframe? What should the final plot show (what is the xaxis, what is the yaxis, what should the points represent?), what have you tried to create such a plot? What is the problem? In how far does it not match your expectations, or do you get an error? As you can see the list of open questions is longer than your question itself. Please edit it to make it answerable. Commented Jun 12, 2017 at 16:16
  • Thanks I've edited the post for more clarity Commented Jun 12, 2017 at 17:11
  • Converting the "Year" column would solve the Error that you've posted. (See: stackoverflow.com/questions/35432918/…) To answer the other questions, you need to show a bit more of your code. Commented Jun 12, 2017 at 17:27
  • Thanks, I have shown the rest of my code with a few updates from info I found on plotting pandas Dataframes, but it is still not working like I need, suggestions? Commented Jun 12, 2017 at 23:51

1 Answer 1

2

You're getting a key error, because there is no column named "Year" in the aggregated dataframe, since the year is used as index.

In order to get back the index as a column of the dataframe use .reset_index().

Something like this should work:

ptsbyyear = df.groupby("Year").agg({'PTS': ['sum']}).reset_index()
ptsbyyear.plot(kind="scatter", x="Year", y="PTS")
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much. I noticed that the year want a column name and realized that was the problem but didn't know how to retrieve the year as a . Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.