1

This is an approximation of the original dataframe. In the original, there are many more columns than are shown here.

id  init_cont  family  description  value
1   K          S       impacteach   1
1   K          S       impactover   3
1   K          S       read         2
2   I          S       impacteach   2
2   I          S       impactover   4
2   I          S       read         1
3   K          D       impacteach   3
3   K          D       impactover   5
3   K          D       read         3

I want to combine the values for impacteach and impactover to generate an average value that is just called impact. I would like the final table to look like the following:

id  init_cont  family  description  value
1   K          S       impact       2
1   K          S       read         2
2   I          S       impact       3
2   I          S       read         1
3   K          D       impact       4
3   K          D       read         3

I have not been able to figure out how to generate this table. However, I have been able to create a dataframe that looks like this:

id  description  value
1   impact       2
1   read         2
2   impact       3
2   read         1
3   impact       4
3   read         3

What is the best way for me to take these new values and add them to the original dataframe? I also need to remove the original values (like impacteach and impactover) in the original dataframe. I would prefer to modify the original dataframe as opposed to creating an entirely new dataframe because the original dataframe has many columns.

In case it is useful, this is a summary of the code I used to create the shorter dataframe with impact as a combination of impacteach and impactover:

df %<%
  mutate(newdescription = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %<% 
  group_by(id, newdescription) %<%
  summarise(value = mean(as.numeric(value)))
1
  • What do you mean when you said you got many more column? Have you got more columns similar to description or just value? Commented Apr 26, 2018 at 18:06

3 Answers 3

4

What if you changed the description column first so that it could be included in the grouping:

df %>% 
    mutate(description = substr(description, 1, 6)) %>%
    group_by(id, init_cont, family, description) %>% 
    summarise(value = mean(value))

# A tibble: 6 x 5
# Groups:   id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact         2.
# 2     1 K         S      read           2.
# 3     2 I         S      impact         3.
# 4     2 I         S      read           1.
# 5     3 K         D      impact         4.
# 6     3 K         D      read           3.
Sign up to request clarification or add additional context in comments.

Comments

1

You just need to modify your group_by statement. Try group_by(id, init_cont, family)

Because your id seems to be mapped to init_cont and family already, adding in these values won't change your summarization result. Then you have all the columns you want with no extra work.

If you have a lot of columns you could trying something like the code below. Essentially, do a left_join onto your original data with your summarised data, but doing it using the . to not store off a new dataframe. Then, once joined (by id and description which we modified in place) you'll have two value columns which should be prepeneded with a .x and .y, drop the original and then use distinct to get rid of the duplicate 'impact' columns.

df %>% 
  mutate(description = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %>%
  left_join(. %>%
              group_by(id, description)
              summarise(value = mean(as.numeric(value))
            ,by=c('id','description')) %>%
  select(-value.x) %>%
  distinct()

3 Comments

I have many more columns than those shown here. Is there an easy way for me to do this for over 100 columns?
What does select(-value.x) mean? Also, the comma before "by=c" is leading to an error.
The negative select drops a column. And I'm sorry about the error, you didn't provide a reproducible example so I can't test the code
0

gsub can be used to replace description containing imact as impact and then group_by from dplyr package will help in summarising the value.

df %>% group_by(id, init_cont, family, 
        description = gsub("^(impact).*","\\1", description)) %>%
  summarise(value = mean(value))

# # A tibble: 6 x 5
# # Groups: id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact       2.00
# 2     1 K         S      read         2.00
# 3     2 I         S      impact       3.00
# 4     2 I         S      read         1.00
# 5     3 K         D      impact       4.00
# 6     3 K         D      read         3.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.