Sorting columns based on individual row values in R

Question

First, I apologize for the somewhat non-descriptive title but as you can probably see below, the problem is not easy to sum up in a single question.

I have a data set of peer evaluations, where group members rated each other on a set of criteria. Unfortunately, I wasn't able to pre-populate respondents' group members in the the survey so I ended up with a very jumbled data set that looks like this:

data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                  gm2 = c("Bill", "Lin", "Bill", "Lin"),
                  gm3 = c("Cass", "Dan", "Lin", "Cass"),
                  gm4 = c("Dan", "Cass", "Dan", "Bill"),
                  crit1_1 = c(5, 2, 4, 4), 
                  crit1_2 = c(5, 3, 3, 3), 
                  crit1_3 = c(4, 2, 4, 4), 
                  crit1_4 = c(5, 5, 4, 4))

where gm1 is the respondent and crit1 represents their self-ratings and gm2-4 are their group members with crit1_2-4 representing their ratings of each member on the first criteria (there are multiple). I need to calculate the average of the group's ratings for each member but as you can see, there is no consistent pattern for who is gm1-4. For example, for Lin, I need an average of Bill, Cass, and Dan's ratings of Lin.

I know this is a bit of a mess but I hoping there is a relatively simple way of transforming this data so that I can perform the calculations that I need to.

I've thought about assigning each group member a number like so

key = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                 num = c(1, 2, 3, 4))
data1 = data[ , c(1,2,3,4)]
data1[] <- key$num[match(unlist(data), key$gm1)]
data2 = cbind(data1, data[ , -c(1,2,3,4)])

and then sorting columns gm1-4 so that the pattern is always the same for each criteria. However, I do not know how to sort gm1-4 for each row in such a way that creates corresponding changes in crit1_1-4.

Any suggestions would be much appreciated!

If the first column represents thier self-evaluation it means that the other columns are the evaluations of thier peers. You only need to group the data by gm1 and calculate the mean of crit1_2-4. — Marcos Pérez
– Marcos Pérez, Commented Apr 19, 2021 at 19:06

Ben · Accepted Answer · 2021-04-19 19:37:52Z

1

Perhaps this might be more manageable in long form. With pivot_longer from tidyr, you can extend your data for each group member and criterion. The filter and select will remove the "self-rating" data.

library(tidyverse)

data %>%
  pivot_longer(cols = starts_with("crit"), 
               values_to = "rating", 
               names_to = c("criteria", "rating_number"), 
               names_pattern = "crit(\\d+)_(\\d+)") %>%
  filter(rating_number != 1) %>%
  select(-gm1) %>%
  pivot_longer(cols = starts_with("gm"), 
               names_pattern = "gm(\\d+)", 
               names_to = "member_number", 
               values_to = "member_name") %>%
  filter(member_number == rating_number) %>%
  group_by(member_name, criteria) %>%
  summarise(mean_rating = mean(rating))

Output

  member_name criteria mean_rating
  <chr>       <chr>          <dbl>
1 Bill        1               4   
2 Cass        1               4.33
3 Dan         1               3.67
4 Lin         1               3.33

edited Apr 19, 2021 at 19:37

answered Apr 19, 2021 at 18:45

Ben

30.7k5 gold badges28 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Addison Over a year ago

Thank you for this. It appears though that this method is calculating the average of each member's ratings of their team members, rather than their team members' ratings of them . For example, Bill gave himself a 2, and rated his team members Lin = 3, Dan = 2, and Cass = 5, which I think is how the mean rating is being calculated in your solution. Instead, I need to average Bill's team members' ratings of Bill, so Bill's mean rating should really be (5 + 3 + 4)/3 = 4.

Ben Over a year ago

@AddisonMaerz Please see edited answer and let me know if this is what you had in mind.

Yuriy Saraykin · Accepted Answer · 2021-04-19 19:49:40Z

base

data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                  gm2 = c("Bill", "Lin", "Bill", "Lin"),
                  gm3 = c("Cass", "Dan", "Lin", "Cass"),
                  gm4 = c("Dan", "Cass", "Dan", "Bill"),
                  crit1_1 = c(5, 2, 4, 4), 
                  crit1_2 = c(5, 3, 3, 3), 
                  crit1_3 = c(4, 2, 4, 4), 
                  crit1_4 = c(5, 5, 4, 4))



cols_gm <- sapply(data, is.character)
cols_crit <- sapply(data, is.numeric)


crit <- data[cols_crit]
gm <- data[cols_gm]
nm <- unique(unlist(gm))


sapply(nm, function(x) {
  coord <- matrix(c(seq(nrow(data)), max.col(gm == x)), nrow = nrow(data))
  coord <- coord[!(coord[, 2] == 1), ]
  mean(crit[coord])
})
#>      Lin     Bill     Cass      Dan 
#> 3.333333 4.000000 4.333333 3.666667

^{Created on 2021-04-19 by the reprex package (v2.0.0)}

Collectives™ on Stack Overflow

Sorting columns based on individual row values in R

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related