R: Applying functions to multiple named columns in a data frame - improvements?

Question

I have a data frame with a number of columns. I want to do repeated operations on many of these columns, which I've labelled with names.

For example:

convert.f <- function(v) {
  if (is.numeric(v) && !is.factor(v)) {
    v <- as.factor(v)
  }
  return (v)
}
f <- data.frame(X1=rep(2,2),X2=rep(1,2), X3=rep(3,2), XA=rep('a',2), X4=rep(4,2))
cols <- c('X1', 'X2', 'X4')

# Now, I want to apply 'convert.f' to cols X1, X2, and X4 only and store it in the
# original data frame.

All of the below attempts are incorrect.

# Doesn't seem to return a data frame I can use...
apply(f[, cols], 2, convert.f)

# Same as above I think
f2 <- sapply(f[, cols], convert.f)

# Even if I coerce it, I get some problems
f2 <- data.frame(f2)
f2$X1 # Error

# Appears to have no change in the data frame
ddply(f, cols, convert.f)

# This doesn't seem to save the results back into the frame
for (col in cols) {
  f[col] <- convert.f(f[col])
}

A possible solution:

# Here's the best way I've found so far but it seems inefficient.
f3 <- data.frame(lapply(f[,cols], convert.f))
f[, names(f3)] <- f3

# However, if I do this in a function and return f, it doesn't seem to make my changes stick. Still trying to figure that one out.

Why does the last one work with lapply coerced to a data frame?

Are there any improvements here? It seems that I am missing something fundamental with how the various 'apply' functions work.

BrodieG · Accepted Answer · 2014-03-19 19:25:52Z

1

You are very close with your last two attempts. Here is a simple version that works:

f[cols] <- lapply(f[cols], convert.f)

which produces:

'data.frame':   2 obs. of  5 variables:
 $ X1: Factor w/ 1 level "2": 1 1
 $ X2: Factor w/ 1 level "1": 1 1
 $ X3: num  3 3
 $ XA: Factor w/ 1 level "a": 1 1
 $ X4: Factor w/ 1 level "4": 1 1

Note:

for (col in cols) {
  f[col] <- convert.f(f[, col])
}

Also works. Your version did not work because f[col] returns a data frame, not a vector, so your is.numeric(v) test fails and convert.f returns an unchanged single column data frame that is inserted into f[col], so it looks like f isn't changed. By using the two parameter version of [, the drop argument kicks in and f[, col] returns a vector instead of a 1 column data frame.

edited Mar 19, 2014 at 19:25

answered Mar 19, 2014 at 19:20

BrodieG

52.8k9 gold badges99 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R: Applying functions to multiple named columns in a data frame - improvements?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related