6

I have a multi-index dataframe that is sampled here:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

df = pd.read_csv('https://docs.google.com/uc?id=1mjmatO1PVGe8dMXBc4Ukzn5DkkKsbcWY&export=download', index_col=[0,1])

df

enter image description here

I tried to plot this so that each column ['Var1', 'Var2', 'Var3', 'Var4'] in a separate figure, the Country is a curve, and y-axis, and the Year is the x-axis

the requested figure would be like this Ms-Excel figure

enter image description here

I tried to plot it using

f, a = plt.subplots(nrows=2, ncols=2, figsize=(9, 12), dpi= 80)

df.xs('Var1').plot(ax=a[0])
df.xs('Var2').plot(ax=a[1])
df.xs('Var3').plot(x=a[2])
df.xs('Var4').plot(kax=a[3])

but it gives KeyError: 'Var1'

I also tried the following

f, a = plt.subplots(nrows=2, ncols=2, 
                              figsize=(7, 10), dpi= 80)
for indicator in indicators_list:
    for c, country in enumerate(in_countries):
        ax = df[indicator].plot()
        ax.title.set_text(country + " " + indicator) 

but it returns 3 empty figures and one figure with all the data in it enter image description here

What is wrong with my trials and What can I do to get what I need?

0

2 Answers 2

11

If I understand correctly you should first pivot your dataframe in order to have countries as columns:

In [151]: df.reset_index().pivot('Year','Country','Var1').plot(ax=a[0,0], title='Var1', grid=True)
Out[151]: <matplotlib.axes._subplots.AxesSubplot at 0x127e2320>

In [152]: df.reset_index().pivot('Year','Country','Var2').plot(ax=a[0,1], title='Var2', grid=True)
Out[152]: <matplotlib.axes._subplots.AxesSubplot at 0x12f47b00>

In [153]: df.reset_index().pivot('Year','Country','Var3').plot(ax=a[1,0], title='Var3', grid=True)
Out[153]: <matplotlib.axes._subplots.AxesSubplot at 0x12f84668>

In [154]: df.reset_index().pivot('Year','Country','Var4').plot(ax=a[1,1], title='Var4', grid=True)
Out[154]: <matplotlib.axes._subplots.AxesSubplot at 0x12fbd390>

Result:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1
  • If the data is in the form with the columns in the index, then .reset_index() or do not specify the index_col parameter when loading the data.
  • Now convert the dataframe to a long form with pandas.DataFrame.melt
  • Plot using seaborn.relplot. seaborn is a high-level API for matplotlib
  • In this example, random test data is used, because the file is no longer available.
  • Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2, seaborn 0.11.2
import seaborn as sns
import pandas as pd

# reset the index if needed
df = df.reset_index()

# convert the dataframe to a long form
dfm = df.melt(id_vars=['Country', 'Year'])

# display(dfm.head())
  Country  Year variable  value
0     USA  1960       V1   67.0
1     USA  1970       V1   48.0
2     USA  1980       V1   59.0
3     USA  1990       V1   20.0
4     USA  2000       V1   41.0

# plot
sns.relplot(data=dfm, kind='line', col='variable', col_wrap=2, x='Year', y='value', hue='Country',
            height=3.75, facet_kws={'sharey': False, 'sharex': True})

enter image description here

Sample Data

data = {'Country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Egypt', 'Egypt', 'Egypt', 'Egypt', 'France', 'France', 'France', 'France', 'France', 'France', 'France', 'S.Africa', 'S.Africa', 'S.Africa'], 'Year': [1960, 1970, 1980, 1990, 2000, 2010, 2020, 1980, 1990, 2000, 2010, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 1990, 2000, 2010], 'V1': [67, 48, 59, 20, 41, 71, 51, 63, 43, 18, 54, 54, 58, 27, 26, 42, 79, 77, 65, 78, 33], 'V2': [7.802, 4.89, 5.329, 1.899, 9.586, 8.827, 0.865, 2.436, 2.797, 2.157, 0.019, 6.975, 0.933, 7.579, 3.463, 7.829, 5.098, 1.726, 7.386, 7.861, 8.062], 'V3': [0.725, 0.148, 0.62, 0.322, 0.109, 0.565, 0.417, 0.094, 0.324, 0.529, 0.078, 0.741, 0.236, 0.245, 0.993, 0.591, 0.812, 0.768, 0.851, 0.355, 0.991], 'V4': [76.699, 299.423, 114.279, 158.051, 118.266, 273.444, 213.815, 144.96, 145.808, 107.922, 223.09, 68.148, 169.363, 220.797, 79.168, 277.759, 263.677, 244.575, 126.412, 277.063, 218.401]}
df = pd.DataFrame(data)

  Country  Year  V1     V2     V3       V4
0     USA  1960  67  7.802  0.725   76.699
1     USA  1970  48  4.890  0.148  299.423
2     USA  1980  59  5.329  0.620  114.279
3     USA  1990  20  1.899  0.322  158.051
4     USA  2000  41  9.586  0.109  118.266

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.