0

I have a DataFrame with 7 columns of categorical information that I would like to loop through for each columns unique labels and count of rows per label, that would then be added as a bar chart subplot to my figure. I am able to create a figure with the correct amount of subplots for the figure and also the individual DataFrames with column name and counts, but I'm not sure how I can return a new subplot to the figure from each cycle in the loop. Any help of the proper process? Provided is my attempt below and error message at the loop:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Libraries

# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

Categorical DataFrame (df_cat)

df_cat = example data from .head() method head

Figure and amount of subplots

plt_nrows = round(len(df_cat.columns) / 2)
plt_ncols = len(df_cat.columns) - sub_plt_rows

fig, axs = plt.subplots(plt_nrows, plt_ncols, figsize=(20,15))

Loop of DataFrames with individual columns and label counts:

for i in df_cat.columns:
    df_cat_counts = df_cat[i].value_counts().rename_axis([i]).reset_index(name='counts')
    x = df_cat_counts[i]
    y = df_cat_counts['counts']
    axs[i,0].plot(x, y)

1 Answer 1

1

I am limiting myself to columns ['Age', 'Directors', 'Genres', 'Country', 'Language'], because imdb, rotten tomatoes, and netflix are something that imho is not really categorical data.

import itertools

# split?
split_dict={'Directors':',',
       'Genres':',',
       'Country':',',
       'Language':','}

columns = ['Age', 'Directors', 'Genres', 'Country', 'Language']


fig = plt.figure(figsize=(20,20))

for p, col in enumerate(columns):

    ax = fig.add_subplot(2,3,p+1)

    split = split_dict.get(col)

    if split: # split individual cells by the resp. string. unfold and flatten using chain.from_iterable
        x = pd.Series(itertools.chain.from_iterable(df_cat[col].dropna().str.split(split))).to_frame(name=col)
    else:
        x = df_cat[[col]]

    x.groupby(col).apply(len).plot(kind='bar',ax=ax)

    split=None

enter image description here

Sign up to request clarification or add additional context in comments.

7 Comments

That is exactly what I am going for. I tried applying this logic to my code, but the output of plt.tight_layout() is <Figure size 432x288 with 0 Axes> and no figure is shown.
My comment above is if that command is run in a separate cell in Jupyter. When run together I get a bunch of error messages after a long time of running. All along similar lines of RuntimeWarning: Glyph 2332 missing from current font. font.set_text(s, 0.0, flags=flags)
@cphill plt.tight_layout() is only needed to adjust the subplot spacing. The error you are getting seems to have something to do with your data. Can you post a representative sample of your data that you can reporduce this error with?
Yes, I actually noticed a more complete error message Tight layout not applied. tight_layout cannot make axes height small enough to accommodate all axes decorations. The dataset is hosted on Kaggle at kaggle.com/ruchi798/…
@cphill to be honest, I would rather not register at kaggle. Can you copy+paste a subset of the data?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.