1

I'm leaning python pandas and playing with some example data. I have a CSV file of a dataset with net worth by percentile of US population by quarter of year. I've successfully subseted the data by percentile to create three scatter plots of net worth by year, one plot for each of three population sections. However, I'm trying to combine those three plots to one data frame so I can combine the lines on a single plot figure.

Data here: https://www.federalreserve.gov/releases/z1/dataviz/download/dfa-income-levels.csv

Code thus far:

import pandas as pd

import matplotlib.pyplot as plt
 
# importing numpy as np
import numpy as np
 
df = pd.read_csv("dfa-income-levels.csv")
df99th = df.loc[df['Category']=="pct99to100"]

df99th.plot(x='Date',y='Net worth', title='Net worth by percentile')

dfmid = df.loc[df['Category']=="pct40to60"]

dfmid.plot(x='Date',y='Net worth')
dflow = df.loc[df['Category']=="pct00to20"]

dflow.plot(x='Date',y='Net worth')

data = dflow['Net worth'], dfmid['Net worth'], df99th['Net worth']
headers = ['low', 'mid', '99th']
newdf = pd.concat(data, axis=1, keys=headers)

And that yields a dataframe shown below, which is not what I want for plotting the data.

    low mid 99th
0   NaN NaN 3514469.0
3   NaN 2503918.0   NaN
5   585550.0    NaN NaN
6   NaN NaN 3602196.0
9   NaN 2518238.0   NaN
... ... ... ...
747 NaN 8610343.0   NaN
749 3486198.0   NaN NaN
750 NaN NaN 32011671.0
753 NaN 8952933.0   NaN
755 3540306.0   NaN NaN

Any recommendations for other ways to approach this?

1
  • 1
    I've edited your linked to include the income-levels.csv. waiting for approval Commented May 24, 2021 at 17:17

2 Answers 2

1
#filter you dataframe to only the categories you're interested in
filtered_df = df[df['Category'].isin(['pct99to100', 'pct00to20', 'pct40to60'])]
filtered_df = filtered_df[['Date', 'Category', 'Net worth']]

fig, ax = plt.subplots() #ax is an axis object allowing multiple plots per axis
filtered_df.groupby('Category').plot(ax=ax)

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

I don't see the categories mentioned in your code in the csv file you shared. In order to concat dataframes along columns, you could use pd.concat along axis=1. It concats the columns of same index number. So first set the Date column as index and then concat them, and then again bring back Date as a dataframe column.

  • To set Date column as index of dataframe, df1 = df1.set_index('Date') and df2 = df2.set_index('Date')
  • Concat the dataframes df1 and df2 using df_merge = pd.concat([df1,df2],axis=1) or df_merge = pd.merge(df1,df2,on='Date')
  • bringing back Date into column by df_merge = df_merge.reset_index()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.