3

I am a cardiologist and love coding in R - i am having a real issue with sorting a data frame and i suspect the solution is really easy!

I have a data frame with summary values from multiple studies df$study. Most studies have only one summary value (df$summary). However as you can see Study A has three summary values (df$no.of.estimate). See below

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

So i want to sort the dataframe by df$summary - which is easy. However, if each study has more than one estimate then i want to group these studies together and appear in order using the "no.of.estimates" column.

So essentially the desired output is

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
2
  • 1
    You must have noticed that by using cbind, have created a matrix with columns as character class. Use data.frame(study, no.of.estimate...) Commented Jan 4, 2015 at 14:55
  • 1
    You don't want to sort you whole data set by sudy and no.of.estimate rather only in case the no.of.estimate has more than one value? It seems like you overcomplicating this a bit. It seems like you could just do df[with(df, order(study, no.of.estimate)), ], though take a look on @akruns comment first. Commented Jan 4, 2015 at 15:03

2 Answers 2

2

You could try

library(dplyr)
df %>% 
     mutate(study=factor(study, levels=unique(study))) %>%
     arrange(study,no.of.estimate)
  #  study no.of.estimate summary
  #1     E              1       1
  #2     A              1       7
  #3     A              2       2
  #4     A              3       5
  #5     F              1       3
  #6     B              1       6
  #7     C              1       8
  #8     D              1       9

Or a base R approach

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

2

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
#    study no.of.estimate summary indx
# 1:     E              1       1    1
# 2:     A              1       7    2
# 3:     A              2       2    2
# 4:     A              3       5    2
# 5:     F              1       3    3
# 6:     B              1       6    4
# 7:     C              1       8    5
# 8:     D              1       9    6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.