0

Given a data frame:

l$`__a` <- data.frame(`__ID` = stringi::stri_rand_strings(10, 1), 
col = stringi::stri_rand_strings(10, 1), check.names = F )

And two supporting functions:

prefixColABC <- function(dfCol) {
paste0("ABC_", dfCol)
}

prefixColDEF <- function(dfCol) {
  paste0("DEF_", dfCol)
}

How can I apply the first function for data frame column names staring with __ and the second for all other columns?

To solve this problem, I thought I would subset first all columns with names starting with __, apply prefixColABC to them, then subset all others and apply prefixColDEF to them. Then I would use cbind() to put all of the columns together into one data frame again.

Here's some of my progress:

Here's how the first function can be applied to all columns:

as.data.frame( apply(l$`__a`, 2, prefixColABC) )

And here's how I can subset the columns. All with column names starting with __:

l$`__a`[ grep(pattern = "^__", l$`__a`), 1 ]

I don't know how to subset all other columns that don't match this pattern. And I don't know how to set up the condition inside the apply statement

I think this question is similar to mine, but does not select the columns based on matching a pattern: R Applying different functions to different data frame columns

1 Answer 1

2

Try this assuming that the input data frame is called dd:

hasPrefix <- grepl("^__", names(dd))
dd[, hasPrefix] <- lapply(dd[, hasPrefix, drop = FALSE], prefixColABC)
dd[, !hasPrefix] <- lapply(dd[, !hasPrefix, drop = FALSE], prefixColDEF)

giving:

> dd
    __ID   col
1  ABC_G DEF_x
2  ABC_n DEF_U
3  ABC_c DEF_G
4  ABC_O DEF_X
5  ABC_p DEF_E
6  ABC_U DEF_j
7  ABC_M DEF_G
8  ABC_0 DEF_l
9  ABC_V DEF_i
10 ABC_B DEF_u

Note: The input dd, prior to modification, is:

dd <- structure(list(`__ID` = structure(c(4L, 6L, 3L, 7L, 8L, 9L, 5L, 
1L, 10L, 2L), .Label = c("0", "B", "c", "G", "M", "n", "O", "p", 
"U", "V"), class = "factor"), col = structure(c(8L, 7L, 2L, 9L, 
1L, 4L, 2L, 5L, 3L, 6L), .Label = c("E", "G", "i", "j", "l", 
"u", "U", "x", "X"), class = "factor")), .Names = c("__ID", "col"
), row.names = c(NA, -10L), class = "data.frame")
Sign up to request clarification or add additional context in comments.

2 Comments

Your solution is extremely helpful, thanks! I'm trying to adapt it not to overwrite the original data frame. Could you help me with this too? I'm attempting now to create an empty data frame with the same structure in this way: dd2 <- dd[0,] and then fill it with data from the apply command, but I'm getting this error "replacement element 1 has 10 rows to replace 0 rows". Should I instead use vapply to make sure that I always get a column vector to insert into the data frame?
I've also tried to write a function which uses your example, but it's not returning a data frame with two columns as I expected. Do you have any hints about this to give? processOneDF <- function (dfName){ dfName["__ID"] <- lapply(dfName[, "__ID", drop = FALSE], prefixColABC) dfName["col"] <- lapply(dfName[, "col", drop = FALSE], prefixColDEF) } l5<- l l5$__a<- processOneDF(l5$__a)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.