Adding new, combined values to existing dataframe in R

Question

This is an approximation of the original dataframe. In the original, there are many more columns than are shown here.

id  init_cont  family  description  value
1   K          S       impacteach   1
1   K          S       impactover   3
1   K          S       read         2
2   I          S       impacteach   2
2   I          S       impactover   4
2   I          S       read         1
3   K          D       impacteach   3
3   K          D       impactover   5
3   K          D       read         3

I want to combine the values for impacteach and impactover to generate an average value that is just called impact. I would like the final table to look like the following:

id  init_cont  family  description  value
1   K          S       impact       2
1   K          S       read         2
2   I          S       impact       3
2   I          S       read         1
3   K          D       impact       4
3   K          D       read         3

I have not been able to figure out how to generate this table. However, I have been able to create a dataframe that looks like this:

id  description  value
1   impact       2
1   read         2
2   impact       3
2   read         1
3   impact       4
3   read         3

What is the best way for me to take these new values and add them to the original dataframe? I also need to remove the original values (like impacteach and impactover) in the original dataframe. I would prefer to modify the original dataframe as opposed to creating an entirely new dataframe because the original dataframe has many columns.

In case it is useful, this is a summary of the code I used to create the shorter dataframe with impact as a combination of impacteach and impactover:

df %<%
  mutate(newdescription = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %<% 
  group_by(id, newdescription) %<%
  summarise(value = mean(as.numeric(value)))

What do you mean when you said you got many more column? Have you got more columns similar to description or just value? — MKR
– MKR, Commented Apr 26, 2018 at 18:06

C. Braun · Accepted Answer · 2018-04-26 17:53:51Z

4

What if you changed the description column first so that it could be included in the grouping:

df %>% 
    mutate(description = substr(description, 1, 6)) %>%
    group_by(id, init_cont, family, description) %>% 
    summarise(value = mean(value))

# A tibble: 6 x 5
# Groups:   id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact         2.
# 2     1 K         S      read           2.
# 3     2 I         S      impact         3.
# 4     2 I         S      read           1.
# 5     3 K         D      impact         4.
# 6     3 K         D      read           3.

answered Apr 26, 2018 at 17:53

C. Braun

5,2611 gold badge22 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark · Accepted Answer · 2018-04-26 17:54:11Z

1

You just need to modify your group_by statement. Try group_by(id, init_cont, family)

Because your id seems to be mapped to init_cont and family already, adding in these values won't change your summarization result. Then you have all the columns you want with no extra work.

If you have a lot of columns you could trying something like the code below. Essentially, do a left_join onto your original data with your summarised data, but doing it using the . to not store off a new dataframe. Then, once joined (by id and description which we modified in place) you'll have two value columns which should be prepeneded with a .x and .y, drop the original and then use distinct to get rid of the duplicate 'impact' columns.

df %>% 
  mutate(description = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %>%
  left_join(. %>%
              group_by(id, description)
              summarise(value = mean(as.numeric(value))
            ,by=c('id','description')) %>%
  select(-value.x) %>%
  distinct()

edited Apr 26, 2018 at 17:54

answered Apr 26, 2018 at 17:47

Mark

4,5672 gold badges31 silver badges52 bronze badges

3 Comments

melbez Over a year ago

I have many more columns than those shown here. Is there an easy way for me to do this for over 100 columns?

melbez Over a year ago

What does select(-value.x) mean? Also, the comma before "by=c" is leading to an error.

Mark Over a year ago

The negative select drops a column. And I'm sorry about the error, you didn't provide a reproducible example so I can't test the code

MKR · Accepted Answer · 2018-04-26 18:20:56Z

0

gsub can be used to replace description containing imact as impact and then group_by from dplyr package will help in summarising the value.

df %>% group_by(id, init_cont, family, 
        description = gsub("^(impact).*","\\1", description)) %>%
  summarise(value = mean(value))

# # A tibble: 6 x 5
# # Groups: id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact       2.00
# 2     1 K         S      read         2.00
# 3     2 I         S      impact       3.00
# 4     2 I         S      read         1.00
# 5     3 K         D      impact       4.00
# 6     3 K         D      read         3.00

answered Apr 26, 2018 at 18:20

MKR

20.2k4 gold badges26 silver badges36 bronze badges

Collectives™ on Stack Overflow

Adding new, combined values to existing dataframe in R

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related