R: Using the sort function in a dataframe based on multiple columns

Question

I am a cardiologist and love coding in R - i am having a real issue with sorting a data frame and i suspect the solution is really easy!

I have a data frame with summary values from multiple studies df$study. Most studies have only one summary value (df$summary). However as you can see Study A has three summary values (df$no.of.estimate). See below

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

So i want to sort the dataframe by df$summary - which is easy. However, if each study has more than one estimate then i want to group these studies together and appear in order using the "no.of.estimates" column.

So essentially the desired output is

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

You must have noticed that by using cbind, have created a matrix with columns as character class. Use data.frame(study, no.of.estimate...) — akrun
– akrun, Commented Jan 4, 2015 at 14:55
You don't want to sort you whole data set by sudy and no.of.estimate rather only in case the no.of.estimate has more than one value? It seems like you overcomplicating this a bit. It seems like you could just do df[with(df, order(study, no.of.estimate)), ], though take a look on @akruns comment first. — David Arenburg
– David Arenburg, Commented Jan 4, 2015 at 15:03

David Arenburg · Accepted Answer · 2015-01-04 15:28:17Z

You could try

library(dplyr)
df %>% 
     mutate(study=factor(study, levels=unique(study))) %>%
     arrange(study,no.of.estimate)
  #  study no.of.estimate summary
  #1     E              1       1
  #2     A              1       7
  #3     A              2       2
  #4     A              3       5
  #5     F              1       3
  #6     B              1       6
  #7     C              1       8
  #8     D              1       9

Or a base R approach

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

David Arenburg · Accepted Answer · 2015-01-04 15:36:22Z

2

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
#    study no.of.estimate summary indx
# 1:     E              1       1    1
# 2:     A              1       7    2
# 3:     A              2       2    2
# 4:     A              3       5    2
# 5:     F              1       3    3
# 6:     B              1       6    4
# 7:     C              1       8    5
# 8:     D              1       9    6

edited Jan 4, 2015 at 15:36

answered Jan 4, 2015 at 15:15

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

Collectives™ on Stack Overflow

R: Using the sort function in a dataframe based on multiple columns

2 Answers 2

data

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

data

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related