0

I am quite new to writing functions and am working a generic function that is to be applied to several, but not all, rows in a data frame. The function is supposed to conditionally transform the values of these specified rows.

Example data: df <- data.frame("Var1" = c(0:5), "Var2" = c(-5:0), "Var3" = c(0,0,0,0,0,0)

> df
  Var1 Var2 Var3
1    0   -5    0
2    1   -4    0
3    2   -3    0
4    3   -2    0
5    4   -1    0
6    5    0    0

Example function:

myFun <- function(x, na_value){
  x[x == na_value] <- NA
  x
}

Given that I want 0's to transform to NA for Var1 and Var 2 - but NOT Var3, I have written df$Var1 <- myFun(df$Var1, 0) and df$Var2 <- myFun(df$Var2, 0) - but there has got to be a simpler way of doing this?

What I evision is something like myFun(Var1, Var2, 0) that transforms the 0's in Var1 and Var2 to NA without having to repeat the code for both variables. The function is to be applied for multiple data frames with different variable names and different na_values which is why I have written it in the first place, and it works fine, but I would like to simplify even more.

1
  • Another posibility: df[, c("Var1", "Var2")][df[, c("Var1", "Var2")] == 0] <- NA Commented Jul 10, 2019 at 11:39

2 Answers 2

1

For one single dataframe, apply is the standard way to do this. For example here:

df[ , -3] <- apply(df[ , -3], FUN = myFun, na_value = 0, MARGIN = 2)
df

I don't know if your other dataframes are formatted exactly in the same way, however. But you can combine an apply and a lapply (or mapply) to do this operation on all your dataframes.

EDIT: Here is a more general (and a little ugly or old-fashioned) solution with a for loop:

## Define a list of two dataframes:
df <- data.frame("Var1" = c(0:5), "Var2" = c(-5:0), "Var3" = c(0,0,0,0,0,0))
df2 <- data.frame("VarA" = c(0:5), "VarB" = c(-5:0), "VarC" = c(3,3,3,3,3,3))
my_list <- list(df, df2)
## Colnames to consider, and missing values indicator, for each dataframe:
na_values <- list(0, 3) # NA = 0 in the first one, NA = 3 in the second
cols <- list(c("Var1", "Var2"), c("VarA", "VarB"))
## Define an R function to replace a given character by "NA" in a dataframe:
replace_nas <- function(data, cols, na_value){
    data[ , cols] <- lapply(data[ , cols], FUN = function(x) {
        x[x == na_value] <- NA
        return(x)
    }
    )
    return(data)
}
## Do this operation for each dataframe in "my_list" with a for loop:
res_list <- list()
for (k in 1:length(my_list)) {
    res_list[[k]] <- replace_nas(my_list[[k]], cols[[k]], na_values[[k]])
}
res_list

Probably not optimal, but it works!

Sign up to request clarification or add additional context in comments.

4 Comments

IMHO it's better to use apply for arrays and lapply for data.frames as the first will coerce your data.frame to a matrix.
Oops, you're right! :-) All columns were numeric here so it does not really matter, but this will be a serious problem in other situations.
Thanks, lapply works great for this simple example. Is there a way to identify the columns I want to apply the function by name instead of df[ , -3]?
I edited my previous answer to propose an ugly solution!
0

Since you're asking for a simpler solution, you could just identify the cells that equal to zero, thereby excluding column 3, and set them to NA like so:

df[-3][df[-3] == 0] <- NA
#   Var1 Var2 Var3
# 1   NA   -5    0
# 2    1   -4    0
# 3    2   -3    0
# 4    3   -2    0
# 5    4   -1    0
# 6    5   NA    0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.