0

I have a dataframe like this in Python:

import numpy  as np
import pandas as pd

import plotly.graph_objects  as go
import plotly.figure_factory as ff

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

df['id'] = range(1, len(df.index)+1)

df


# making a long dataframe
# sorting the dataframe by value (i.e. randomly)
long_df = df.melt(id_vars = ['id'], 
                          var_name   = 'type', 
                          value_name = 'value').sort_values(by='value')

long_df['id'] = range(1, len(long_df.index)+1)
long_df.head()

long_df = long_df.drop(long_df[long_df.id < 10].index)
long_df.head()

long_df['type'].value_counts().sort_index()

and I created a boxplot using these commands:

box_plot= ff.create_facet_grid(
    long_df,
    x = 'type',
    y = 'value',

    trace_type = 'box', 
    color_name = 'type',
    color_is_cat = True,
    width = 1000,
    ggplot2 = False,
    showlegend = False,
)

box_plot.show()

enter image description here I there any way to set the box width proportional to the number of rows in that category? (similar to the way R does). I expect the box widths to be in this order (from slim to fat): col2(n=5)--> col4(n=7) --> col1(n=9) --> col3(n=10)

1 Answer 1

1

It can be done with matplotlib:

import numpy  as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

df['id'] = range(1, len(df.index)+1)

# making a long dataframe
# sorting the dataframe by value (i.e. randomly)
long_df = df.melt(id_vars=['id'],
                  var_name='type',
                  value_name='value').sort_values(by='value')

long_df['id'] = range(1, len(long_df.index)+1)
long_df = long_df.drop(long_df[long_df.id < 10].index)
long_df['type'].value_counts().sort_index()

cols = ['red', 'green', 'blue', 'orange']
plt.style.use('ggplot')
fig, ax = plt.subplots()

for i, col in enumerate(sorted(long_df['type'].unique(), key=lambda c: long_df[long_df['type'] == c].shape[0])):
    col_df = long_df[long_df['type'] == col]
    bp = plt.boxplot(col_df['value'],
                     positions=[i*120],
                     widths=len(col_df['value'])*10,
                     patch_artist=True,
                     labels=[col]
                     )
    for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
        plt.setp(bp[element], color=f'xkcd:dark {cols[i]}')
    for patch in bp['boxes']:
        patch.set(facecolor=f'xkcd:light {cols[i]}')

plt.xlabel('type')
plt.show()

Or, if you prefer something closer to R:

from plotnine import ggplot, aes, geom_boxplot
import numpy  as np
import pandas as pd

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

df['id'] = range(1, len(df.index)+1)

# making a long dataframe
# sorting the dataframe by value (i.e. randomly)
long_df = df.melt(id_vars=['id'],
                  var_name='type',
                  value_name='value').sort_values(by='value')

long_df['id'] = range(1, len(long_df.index)+1)
long_df = long_df.drop(long_df[long_df.id < 10].index)

type_list = long_df['type'].value_counts(ascending=True).index.tolist()
long_df['type'] = pd.Categorical(long_df['type'], categories=type_list)

p = ggplot(long_df) + aes(x='type', y='value', fill='type') + geom_boxplot(varwidth = True, alpha=0.8, show_legend=False)
print(p)
Sign up to request clarification or add additional context in comments.

2 Comments

Wow! I didn't know about plotnine. Thank you for introducing this library. Is there any way to make plotnine plots interactive?
I have never done it but it seems to be possible, if not straightforward.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.