1

I have a dataset that looks like this:

  timepoint Nc2g12345_mean Nc2g12345_sd Nc3g15832_mean Nc3g15832_sd Nc3g44800_mean
1       0HR    0.000000000 0.0000000000   0.0008530604  0.000369386   0.0000000000
2     3.5HR    0.015091864 0.0066996647   0.0255918106  0.012939058   0.0096750639
3       8HR    0.225065617 0.0848419107   0.1337172105  0.028238102   0.1190215407
4     264HR    0.004265092 0.0048551997   0.0044785669  0.002306815   0.0034684191
5     720HR    0.001968504 0.0019685039   0.0000000000  0.000000000   0.0003650968
6     216HR    0.000328084 0.0005682581   0.0027724462  0.002884997   0.0023731289

I want to create a ggplot of the "mean" values (these would be by my y-axis values, denoted by columns ending in "_mean") with error bars corresponding to the standard deviations (denoted in columns ending in "_sd"). I want to group (along the x-axis) by the first 9 characters of the column headers (those are gene names). I want to colour my plots based on the timepoint.

The data structure as it is now is suboptimal for plotting. What are smart ways to wrangle the data into a more workable format for ggplot? A tidyverse solution would be ideal.

1
  • 2
    Can you add the result of dput(df) where df is what you used to produce this sample of your dataframe? Commented Jun 5, 2022 at 12:21

1 Answer 1

3

How about this:

library(tidyverse)
dat <- tibble::tribble(~timepoint, ~Nc2g12345_mean, ~Nc2g12345_sd, ~Nc3g15832_mean, ~Nc3g15832_sd, ~Nc3g44800_mean,
  "0HR",   0.000000000, 0.0000000000,   0.0008530604,  0.000369386,   0.0000000000,
"3.5HR",   0.015091864, 0.0066996647,   0.0255918106,  0.012939058,   0.0096750639,
  "8HR",   0.225065617, 0.0848419107,   0.1337172105,  0.028238102,   0.1190215407,
"264HR",   0.004265092, 0.0048551997,   0.0044785669,  0.002306815,   0.0034684191,
"720HR",   0.001968504, 0.0019685039,   0.0000000000,  0.000000000,   0.0003650968,
"216HR",   0.000328084, 0.0005682581,   0.0027724462,  0.002884997,   0.0023731289)

dat %>% pivot_longer(-timepoint, names_pattern="(.*)_(.*)", 
                     names_to = c("vbl", ".value")) %>% 
  filter(!is.na(sd)) %>% 
  ggplot(aes(x=timepoint, y=mean, ymin = mean -sd, ymax = mean+sd, colour=vbl)) + 
  geom_pointrange(position=position_dodge(width=.5)) + 
  theme_classic() + 
  theme(legend.position="top")

Created on 2022-06-05 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

1 Comment

Yep, that did the trick right away. Thanks for the great help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.