Optimising data structure for plotting

Question

I have a dataset that looks like this:

  timepoint Nc2g12345_mean Nc2g12345_sd Nc3g15832_mean Nc3g15832_sd Nc3g44800_mean
1       0HR    0.000000000 0.0000000000   0.0008530604  0.000369386   0.0000000000
2     3.5HR    0.015091864 0.0066996647   0.0255918106  0.012939058   0.0096750639
3       8HR    0.225065617 0.0848419107   0.1337172105  0.028238102   0.1190215407
4     264HR    0.004265092 0.0048551997   0.0044785669  0.002306815   0.0034684191
5     720HR    0.001968504 0.0019685039   0.0000000000  0.000000000   0.0003650968
6     216HR    0.000328084 0.0005682581   0.0027724462  0.002884997   0.0023731289

I want to create a ggplot of the "mean" values (these would be by my y-axis values, denoted by columns ending in "_mean") with error bars corresponding to the standard deviations (denoted in columns ending in "_sd"). I want to group (along the x-axis) by the first 9 characters of the column headers (those are gene names). I want to colour my plots based on the timepoint.

The data structure as it is now is suboptimal for plotting. What are smart ways to wrangle the data into a more workable format for ggplot? A tidyverse solution would be ideal.

Can you add the result of dput(df) where df is what you used to produce this sample of your dataframe? — VFreguglia
– VFreguglia, Commented Jun 5, 2022 at 12:21

DaveArmstrong · Accepted Answer · 2022-06-05 12:26:27Z

3

How about this:

library(tidyverse)
dat <- tibble::tribble(~timepoint, ~Nc2g12345_mean, ~Nc2g12345_sd, ~Nc3g15832_mean, ~Nc3g15832_sd, ~Nc3g44800_mean,
  "0HR",   0.000000000, 0.0000000000,   0.0008530604,  0.000369386,   0.0000000000,
"3.5HR",   0.015091864, 0.0066996647,   0.0255918106,  0.012939058,   0.0096750639,
  "8HR",   0.225065617, 0.0848419107,   0.1337172105,  0.028238102,   0.1190215407,
"264HR",   0.004265092, 0.0048551997,   0.0044785669,  0.002306815,   0.0034684191,
"720HR",   0.001968504, 0.0019685039,   0.0000000000,  0.000000000,   0.0003650968,
"216HR",   0.000328084, 0.0005682581,   0.0027724462,  0.002884997,   0.0023731289)

dat %>% pivot_longer(-timepoint, names_pattern="(.*)_(.*)", 
                     names_to = c("vbl", ".value")) %>% 
  filter(!is.na(sd)) %>% 
  ggplot(aes(x=timepoint, y=mean, ymin = mean -sd, ymax = mean+sd, colour=vbl)) + 
  geom_pointrange(position=position_dodge(width=.5)) + 
  theme_classic() + 
  theme(legend.position="top")

^{Created on 2022-06-05 by the reprex package (v2.0.1)}

answered Jun 5, 2022 at 12:26

DaveArmstrong

22.5k2 gold badges16 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pleasehelp Over a year ago

Yep, that did the trick right away. Thanks for the great help!

Collectives™ on Stack Overflow

Optimising data structure for plotting

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related