How can I aggregate multiple columns in a data.frame with a custom function in R?

Question

I've got a data.frame dt with some duplicate keys and missing data, i.e.

Name     Height     Weight   Age
Alice    180        NA       35
Bob      NA         80       27
Alice    NA         70       NA
Charles  170        75       NA

In this case the key is the name, and I would like to apply to each column a function like

f <- function(x){
  x <- x[!is.na(x)]
  x <- x[1]
  return(x)
  }

while aggregating by the key (i.e., the "Name" column), so as to obtain as a result

Name     Height     Weight   Age
Alice    180        70       35
Bob      NA         80       27
Charles  170        75       NA

I tried

dt_agg <- aggregate(. ~ Name,
                    data = dt,
                    FUN = f)

and I got some errors, then I tried the following

dt_agg_1 <- aggregate(Height ~ Name,
                      data = dt,
                      FUN = f)

dt_agg_2 <- aggregate(Weight ~ Name,
                      data = dt,
                      FUN = f)

and this time it worked.

Since I have 50 columns, this second approach is quite cumbersome for me. Is there a way to fix the first approach?

Thanks for help!

emilliman5 · Accepted Answer · 2017-10-10 14:00:57Z

3

You were very close with the aggregate function, you needed to adjust how aggregate handles NA (from na.omit to na.pass). My guess is that aggregate removes all rows with NA first and then does its aggregating, instead of removing NAs as aggregate iterates over the columns to be aggregated. Since your example dataframe you have an NA in each row you end up with a 0-row dataframe (which is the error I was getting when running your code). I tested this by removing all but one NA and your code works as-is. So we set na.action = na.pass to pass the NA's through.

dt_agg <- aggregate(. ~ Name,
                    data = dt,
                    FUN = f, na.action = "na.pass")

original answer

dt_agg <- aggregate(dt[, -1], 
                    by = list(dt$Name),
                    FUN = f)
dt_agg
# Group.1 Height Weight Age
# 1   Alice    180     70  35
# 2     Bob     NA     80  27
# 3 Charles    170     75  NA

edited Oct 10, 2017 at 14:00

answered Oct 10, 2017 at 13:43

emilliman5

5,9863 gold badges29 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

acylam · Accepted Answer · 2017-10-10 13:37:31Z

2

You can do this with dplyr:

library(dplyr)
df %>%
  group_by(Name) %>%
  summarize_all(funs(sort(.)[1]))

Result:

# A tibble: 3 x 4
     Name Height Weight   Age
   <fctr>  <int>  <int> <int>
1   Alice    180     70    35
2     Bob     NA     80    27
3 Charles    170     75    NA

Data:

df = read.table(text = "Name     Height     Weight   Age
Alice    180        NA       35
Bob      NA         80       27
Alice    NA         70       NA
Charles  170        75       NA", header = TRUE)

answered Oct 10, 2017 at 13:37

acylam

18.8k5 gold badges39 silver badges47 bronze badges

Comments

akrun · Accepted Answer · 2017-10-10 13:51:24Z

2

Here is an option with data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) head(sort(x), 1)), Name]
#      Name Height Weight Age
#1:   Alice    180     70  35
#2:     Bob     NA     80  27
#3: Charles    170     75  NA

answered Oct 10, 2017 at 13:51

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

Parfait · Accepted Answer · 2017-10-10 13:59:32Z

2

Simply, add na.action=na.pass in aggregate() call:

aggdf <- aggregate(.~Name, data=df, FUN=f, na.action=na.pass)
#      Name Height Weight Age
# 1   Alice    180     70  35
# 2     Bob     NA     80  27
# 3 Charles    170     75  NA

answered Oct 10, 2017 at 13:59

Parfait

108k19 gold badges103 silver badges138 bronze badges

Comments

clemens · Accepted Answer · 2017-10-10 13:46:30Z

1

If you add an ifelse() to your function to make sure the function returns a value if all values are NA:

f <- function(x) {
  x <- x[!is.na(x)]
  ifelse(length(x) == 0, NA, x)
}

You can use dplyr to aggregate:

library(dplyr)
dt %>% group_by(Name) %>% summarise_all(funs(f))

This returns:

# A tibble: 3 x 4
     Name Height Weight   Age
   <fctr>  <dbl>  <dbl> <dbl>
1   Alice    180     70    35
2     Bob     NA     80    27
3 Charles    170     75    NA

answered Oct 10, 2017 at 13:46

clemens

6,8433 gold badges24 silver badges34 bronze badges

Collectives™ on Stack Overflow

How can I aggregate multiple columns in a data.frame with a custom function in R?

5 Answers 5

original answer

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

original answer

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related