3

example data:

table1
   c1  c2
r1 1   3
r2 2   2
r3 3   1

table2
   c1  c2
r1 4   6
r2 5   5
r3 6   4

table3
  c1  c2
r1 7  9
r2 8  8
r3 9  7

I have munged the data into a dataframe like the following, with the rows being categories of analysis, the upper level column is the individual being analyzed, and the second level the replicates.

   table1    table2    table3
   r1 r2 r3  r1 r2 r3  r1 r2 r3
c1  1 2 3     4 5 6     7 8 9
c2  3 2 1     6 5 4     9 8 7

I want to turn this into a pointplot where the mean of each replicate is the point, the remaining values are used to create a confidence interval, and a line is drawn for each table. In other words, I want the values to pass to pointplot to be x=[table1,table2,table3], y=mean(all_r_values), hue=[c1, c2]

I am not sure how to do this, or how to reshape my table into a form suitable for this.

1 Answer 1

1

Seaborn prefers the data to be in the long (tidy) format, which you can read more about in the documentation:

It is easiest and best to invoke these functions with a DataFrame that is in “tidy” format, although the lower-level functions also accept wide-form DataFrames or simple vectors of observations.

In essence, this means that you want as much as possible of the information to be contained in the rows of the data frame and not in the columns. In your case, you want to turn your data into this format:

rep  table    c       value
r1  table1    c1      1
r2  table1    c1      2
r3  table1    c1      3
...

I copied your sample data and modified it slighty to get this:

rep c1 c2 table
r1 1  3 table1
r2 2  2 table1
r3 3  1 table1
r1 4  6 table2
r2 5  5 table2
r3 6  4 table2
r1 7  9 table3
r2 8  8 table3
r3 9  7 table3

Copy to the clipboard and read it into pandas via

import pandas as pd
import seaborn as sns

df = pd.read_clipboard()

You can then "melt" the data into the long format, and plot it with Seaborn:

df_long = df.melt(id_vars=['rep', 'table'], var_name='c')
sns.pointplot(x='table', y='value', hue='c', data=df_long, join=False, dodge=0.2)

enter image description here

To get from (and into) your hierarchical column format is quite a bit messier, but can be done via

# Get sample data into the hierarchical column format
df_long_temp = df.melt(id_vars=['rep', 'table'], value_vars=['c1', 'c2'], var_name='c')
df_multi_cols = df_long_temp.set_index(['table', 'rep', 'c']).unstack(level=[0,1])

# Reshape from hierarchical column to long-form data
df_long = df_multi_cols.stack(level=[1,2]).reset_index()
sns.pointplot(x='table', y='value', hue='c', data=df_long, join=False, dodge=0.2)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.