0

I was dealing with creating a variable for a mean score difference between male and female students for each classroom. Class id stands for each classroom. Gender is for each student and the last column is their scores.

I want to have a mean difference value (female(1)-male(0)) for each classroom;

My data looks like this:

data <- matrix(c(1,1,1,1,2,2,2,2,3,3,3,3,
                 0,1,1,0,1,0,0,1,0,1,1,0,
                 20,25,22,21,30,35,32,31,40,45,42,44), 
                 nrow=12, 
                 ncol=3) 
colnames(data) <- c("class id","gender","score")

> data
         class id    gender score
 [1,]        1         0    20
 [2,]        1         1    25
 [3,]        1         1    22
 [4,]        1         0    21
 [5,]        2         1    30
 [6,]        2         0    35
 [7,]        2         0    32
 [8,]        2         1    31
 [9,]        3         0    40
 [10,]        3        1    45
 [11,]        3        1    42
 [12,]        3        0    44

I need it to be something like:

> data
            class id  mean score
 [1,]        1             3
 [2,]        2            -3
 [3,]        3            1.5

Any thoughts?

Thanks!

1
  • Why don't you use data.table for such calculations? Support for group by will help you. Commented Nov 16, 2017 at 19:33

1 Answer 1

1

Here's a solution that uses the tidyverse functions

library(tidyverse)
data %>% as_tibble %>% 
  group_by(`class id`, gender) %>% 
  summarize(mean=mean(score)) %>% 
  spread(gender, mean) %>% 
  mutate(mean_score=`1`-`0`) %>% 
  select(`class id`, mean_score)

Working with a tibble or data.frame is much easier than a matrix, so you start by converting your input data. Then we calculate a mean per gender. Then we spread it out to have a value for each gender in the same record for each class. Then we just take the difference. Note the backticks because of the odd column names in this example.

Alternatively you could do something like this

data %>% as_tibble %>%
  group_by(`class id`) %>% 
  summarize(mean_score=mean(score[gender==1]) - mean(score[gender==0]))

which avoids the reshaping.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.