How to create a count plot for a nested dataframe in matplotlib

Question

I would like to use matplotlib to visualize the following pandas dataframe like shown in the sketch.

The sketch only show what is needed in general terms - there is no need to have the exact layout like it is depicted.

How can I achieve this task using matplotlib?

import pandas as pd
df = pd.DataFrame({'a': [0, 0, 0, 0, 0 , 1, 1,], 'b': [7, 7, 3, 3, 1, 2, 3, ], 'c': [102, 102, -50, -50, 30, 10, 10], })
df
   a  b    c
0  0  7  102
1  0  7  102
2  0  3  -50
3  0  3  -50
4  0  1   30
5  1  2   10
6  1  3   10

gherka · Accepted Answer · 2020-11-06 21:15:12Z

Before starting on the visualisation, I would suggest re-shaping your data to make the nesting levels explicit and pre-calculate the frequencies. Something like:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.gridspec as gridspec

temp_df = pd.concat([
    df.groupby(["a"])["b"].value_counts().reset_index(name="count").rename(columns={"b":"value"}).assign(level_2="b"),
    df.groupby(["a"])["c"].value_counts().reset_index(name="count").rename(columns={"c":"value"}).assign(level_2="c")
])

final_df = (temp_df
            .rename(columns={"a":"level_1"})
            [["level_1", "level_2", "value", "count"]]
            .sort_values(["level_1", "level_2"]))

The resulting dataframe will look like this:

   level_1 level_2  value  count
0        0       b      3      2
1        0       b      7      2
2        0       b      1      1
0        0       c    -50      2
1        0       c    102      2
2        0       c     30      1
3        1       b      2      1
4        1       b      3      1
3        1       c     10      2

Now to plot values and their counts in this nested way, you can use GridSpec to define the layout based on how many values fall under each nested level. I've hard-coded the values for the purposes of illustrating this toy dataset, but you'd want to handle this programmatically for your real data.

You have 9 values so the GridSpec will have 9 columns. You have 2 nested levels, so we reserve 2 bottom rows for nesting labels and add a few more rows to "host" the bar charts.

f = plt.figure(figsize=(10,4), dpi=300)

grid = gridspec.GridSpec(10, 9, figure=f)
mpl.rcParams["axes.edgecolor"] = "gainsboro"

# Use context manager to set mpl parameters for nested axs
with mpl.rc_context({"xtick.major.bottom": False, "ytick.major.left": False}):

    # Level 1 axs (label, ax)
    ax_level_1_0 = ("0", f.add_subplot(grid[9, 0:6]))
    ax_level_1_1 = ("1", f.add_subplot(grid[9, 6:]))
    level_1_axs = [ax_level_1_0, ax_level_1_1]

    # Level 2 axs (label, ax)
    ax_level_2_0b = ("B", f.add_subplot(grid[8, 0:3]))
    ax_level_2_0c = ("C", f.add_subplot(grid[8, 3:6]))
    ax_level_2_1b = ("B", f.add_subplot(grid[8, 6:8]))
    ax_level_2_1c = ("C", f.add_subplot(grid[8, 8:]))
    level_2_axs = [ax_level_2_0b, ax_level_2_0c, ax_level_2_1b, ax_level_2_1c]

# Actual count plot axs (level_1, level_2, ax)
ax_0b = (0, "b", f.add_subplot(grid[0:8, 0:3]))
ax_0b[2].set_ylabel("Frequency")

# Hide y-ticks
with mpl.rc_context({"ytick.major.left": False}):
    ax_0c = (0, "c", f.add_subplot(grid[0:8, 3:6]))
    ax_1b = (1, "b", f.add_subplot(grid[0:8, 6:8]))
    ax_1c = (1, "c", f.add_subplot(grid[0:8, 8:]))

count_axs = [ax_0b, ax_0c, ax_1b, ax_1c]

# Remove white space between subplots
plt.subplots_adjust(wspace=0, hspace=0)

# Add label text to Level 1 and 2 axs
for label, ax in level_1_axs + level_2_axs:
    ax.text(0.5, 0.5, label, horizontalalignment='center',
            verticalalignment='center', transform=ax.transAxes)

    
for l1, l2, ax in count_axs:
    y = final_df.query(f'(level_1 == {l1}) & (level_2 == "{l2}")')["count"]
    labels = final_df.query(f'(level_1 == {l1}) & (level_2 == "{l2}")')["value"]
    x = range(len(y))
    ax.bar(x, y, color="steelblue")
    ax.set_xticks(x)
    ax.set_xticklabels(labels)
    ax.tick_params(
        axis="x", direction="in", bottom=False, pad=-20,
        colors="white", labelsize=15)

Collectives™ on Stack Overflow

How to create a count plot for a nested dataframe in matplotlib

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related