0

I have a data frame with a number of columns. I want to do repeated operations on many of these columns, which I've labelled with names.

For example:

convert.f <- function(v) {
  if (is.numeric(v) && !is.factor(v)) {
    v <- as.factor(v)
  }
  return (v)
}
f <- data.frame(X1=rep(2,2),X2=rep(1,2), X3=rep(3,2), XA=rep('a',2), X4=rep(4,2))
cols <- c('X1', 'X2', 'X4')

# Now, I want to apply 'convert.f' to cols X1, X2, and X4 only and store it in the
# original data frame.

All of the below attempts are incorrect.

# Doesn't seem to return a data frame I can use...
apply(f[, cols], 2, convert.f)

# Same as above I think
f2 <- sapply(f[, cols], convert.f)

# Even if I coerce it, I get some problems
f2 <- data.frame(f2)
f2$X1 # Error

# Appears to have no change in the data frame
ddply(f, cols, convert.f)

# This doesn't seem to save the results back into the frame
for (col in cols) {
  f[col] <- convert.f(f[col])
}

A possible solution:

# Here's the best way I've found so far but it seems inefficient.
f3 <- data.frame(lapply(f[,cols], convert.f))
f[, names(f3)] <- f3

# However, if I do this in a function and return f, it doesn't seem to make my changes stick. Still trying to figure that one out.

Why does the last one work with lapply coerced to a data frame?

Are there any improvements here? It seems that I am missing something fundamental with how the various 'apply' functions work.

1 Answer 1

1

You are very close with your last two attempts. Here is a simple version that works:

f[cols] <- lapply(f[cols], convert.f)

which produces:

'data.frame':   2 obs. of  5 variables:
 $ X1: Factor w/ 1 level "2": 1 1
 $ X2: Factor w/ 1 level "1": 1 1
 $ X3: num  3 3
 $ XA: Factor w/ 1 level "a": 1 1
 $ X4: Factor w/ 1 level "4": 1 1

Note:

for (col in cols) {
  f[col] <- convert.f(f[, col])
}

Also works. Your version did not work because f[col] returns a data frame, not a vector, so your is.numeric(v) test fails and convert.f returns an unchanged single column data frame that is inserted into f[col], so it looks like f isn't changed. By using the two parameter version of [, the drop argument kicks in and f[, col] returns a vector instead of a 1 column data frame.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.