0

I'd like to add a series variables to a dataframe:

patent <- c(1,2,2)
temp1 <- c(TRUE,FALSE,FALSE)
temp2 <- c(FALSE,TRUE,TRUE)
df <- data.frame(patent,temp1,temp2)
df

  patent temp1 temp2
       1  TRUE FALSE
       2 FALSE  TRUE
       2 FALSE  TRUE

What I would like to do is to generate column sums for each temp var, by patent, in a loop / function / using dplyr / etc. Here is what I'm ultimately looking for:

  patent temp1 temp2 new1 new2
       1  TRUE FALSE    1    0
       2 FALSE  TRUE    0    2
       2 FALSE  TRUE    0    2
2
  • 1
    Possible duplicate of Sum across multiple columns with dplyr Commented Jan 12, 2018 at 21:23
  • 1
    Not a duplicate of that question (although still very possibly a dupe), since this one asks for sums within patent groupings. Commented Jan 12, 2018 at 23:02

3 Answers 3

1

You can try the following without specifying each column individually

library(dplyr)
df %>%
  group_by(patent) %>%
  mutate_at(vars(contains("temp")), sum) %>%
  ungroup() %>%
  select(-patent) %>%
  setNames(paste0("new", seq_len(ncol(df)-1))) %>%
  cbind(df, .)

  # patent temp1 temp2 new1 new2
# 1      1  TRUE FALSE    1    0
# 2      2 FALSE  TRUE    0    2
# 3      2 FALSE  TRUE    0    2
Sign up to request clarification or add additional context in comments.

Comments

1

Another solution. We can calculate the sum of each patent to each column first using summarise_all(funs(sum(.))), and then we joined the result to the original data frame using left_join. No need to specify individual column names.

library(dplyr)

df2 <- df %>%
  group_by(patent) %>%
  summarise_all(funs(sum(.))) %>%
  setNames(sub("temp", "new", names(.))) %>%
  left_join(df, ., by = "patent")
df2
#   patent temp1 temp2 new1 new2
# 1      1  TRUE FALSE    1    0
# 2      2 FALSE  TRUE    0    2
# 3      2 FALSE  TRUE    0    2

Or we can use mutate_all to calculate the sum and store the result to new columns.

df2 <- df %>%
  group_by(patent) %>%
  mutate_all(funs("sum" = sum(.))) %>%
  ungroup()
df2
# # A tibble: 3 x 5
#   patent temp1 temp2 temp1_sum temp2_sum
#    <dbl> <lgl> <lgl>     <int>     <int>
# 1   1.00 T     F             1         0
# 2   2.00 F     T             0         2
# 3   2.00 F     T             0         2

If the column names need to be exactly the same as the example, we can do the following.

df2 <- df %>%
  group_by(patent) %>%
  mutate_all(funs("sum" = sum(.))) %>%
  ungroup() %>%
  setNames(ifelse(grepl("_sum$", names(.)), sub("temp", "new", names(.)), names(.))) %>%
  setNames(sub("_sum$", "", names(.)))
df2
# # A tibble: 3 x 5
#   patent temp1 temp2  new1  new2
#    <dbl> <lgl> <lgl> <int> <int>
# 1   1.00 T     F         1     0
# 2   2.00 F     T         0     2
# 3   2.00 F     T         0     2

Comments

0

Using dplyr you can do

library(dplyr)
df %>% group_by(patent) %>%
  mutate(new1=sum(temp1), new2=sum(temp2))

#   patent temp1 temp2  new1  new2
#    <dbl> <lgl> <lgl> <int> <int>
# 1      1  TRUE FALSE     1     0
# 2      2 FALSE  TRUE     0     2
# 3      2 FALSE  TRUE     0     2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.