Apply different functions to data frame columns depending on the column names matching a pattern

Question

Given a data frame:

l$`__a` <- data.frame(`__ID` = stringi::stri_rand_strings(10, 1), 
col = stringi::stri_rand_strings(10, 1), check.names = F )

And two supporting functions:

prefixColABC <- function(dfCol) {
paste0("ABC_", dfCol)
}

prefixColDEF <- function(dfCol) {
  paste0("DEF_", dfCol)
}

How can I apply the first function for data frame column names staring with __ and the second for all other columns?

To solve this problem, I thought I would subset first all columns with names starting with __, apply prefixColABC to them, then subset all others and apply prefixColDEF to them. Then I would use cbind() to put all of the columns together into one data frame again.

Here's some of my progress:

Here's how the first function can be applied to all columns:

as.data.frame( apply(l$`__a`, 2, prefixColABC) )

And here's how I can subset the columns. All with column names starting with __:

l$`__a`[ grep(pattern = "^__", l$`__a`), 1 ]

I don't know how to subset all other columns that don't match this pattern. And I don't know how to set up the condition inside the apply statement

I think this question is similar to mine, but does not select the columns based on matching a pattern: R Applying different functions to different data frame columns

G. Grothendieck · Accepted Answer · 2016-10-07 22:21:38Z

2

Try this assuming that the input data frame is called dd:

hasPrefix <- grepl("^__", names(dd))
dd[, hasPrefix] <- lapply(dd[, hasPrefix, drop = FALSE], prefixColABC)
dd[, !hasPrefix] <- lapply(dd[, !hasPrefix, drop = FALSE], prefixColDEF)

giving:

> dd
    __ID   col
1  ABC_G DEF_x
2  ABC_n DEF_U
3  ABC_c DEF_G
4  ABC_O DEF_X
5  ABC_p DEF_E
6  ABC_U DEF_j
7  ABC_M DEF_G
8  ABC_0 DEF_l
9  ABC_V DEF_i
10 ABC_B DEF_u

Note: The input dd, prior to modification, is:

dd <- structure(list(`__ID` = structure(c(4L, 6L, 3L, 7L, 8L, 9L, 5L, 
1L, 10L, 2L), .Label = c("0", "B", "c", "G", "M", "n", "O", "p", 
"U", "V"), class = "factor"), col = structure(c(8L, 7L, 2L, 9L, 
1L, 4L, 2L, 5L, 3L, 6L), .Label = c("E", "G", "i", "j", "l", 
"u", "U", "x", "X"), class = "factor")), .Names = c("__ID", "col"
), row.names = c(NA, -10L), class = "data.frame")

answered Oct 7, 2016 at 22:21

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bobby Over a year ago

Your solution is extremely helpful, thanks! I'm trying to adapt it not to overwrite the original data frame. Could you help me with this too? I'm attempting now to create an empty data frame with the same structure in this way: dd2 <- dd[0,] and then fill it with data from the apply command, but I'm getting this error "replacement element 1 has 10 rows to replace 0 rows". Should I instead use vapply to make sure that I always get a column vector to insert into the data frame?

Bobby Over a year ago

I've also tried to write a function which uses your example, but it's not returning a data frame with two columns as I expected. Do you have any hints about this to give?

processOneDF <- function (dfName){   dfName["__ID"] <- lapply(dfName[, "__ID", drop = FALSE], prefixColABC)   dfName["col"] <- lapply(dfName[, "col", drop = FALSE], prefixColDEF) }  l5<- l l5$

__a<- processOneDF(l5$__a)

Collectives™ on Stack Overflow

Apply different functions to data frame columns depending on the column names matching a pattern

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related