0

I have 8 datasets and I want to apply a function to convert any number less than 5 to NA on 3 columns(var1,var2,var3) of each dataset. How can I write a function to do it effectively and faster ? I went through lots of such questions on Stack overflow but I didnt find any answer where specific columns were used. I have written the function to replace but cant figure out how to apply to all the datasets.

Input:
Data1
variable1 variable2 variable3 variable4
10           36        56        99
15           3         2         56
4            24        1         1

Expected output:
variable1 variable2 variable3 variable4
10           36         56        99
15           NA         NA        56
NA           24        NA         1

Perform the same thing for 7 more datasets.

Till now I have stored the needed variables and datasets in two different list.

var1=enquo(variable1)
var2=enquo(variable2)
var3=enquo(variable3)
Total=3


listofdfs=list()
listofdfs_1=list()
for(i in 1:8) {
  df=sym((paste0("Data",i)))
listofdfs[[i]]=df
  }

for(e in 1:Ttoal) {    
listofdfs[[e]]= eval(sym(paste0("var",e)))
}

The selected columns will go through this function:

temp_1=function(x,h) {
  h=enquo(h)
  for(e in 1:Total) {    
  if(substr(eval(sym(paste0("var",e))),1,3)=="var") {
 y= x %>% mutate_at(vars(!!h), ~ replace(., which(.<=5),NA))
 return(y)
  }

}
}

I was expecting something :

lapply(for each dataset's selected columns,temp_1)

2
  • @MrFlick Sorry for the inconvenience ! I have edited the question. Commented Oct 17, 2019 at 20:58
  • @MrFlick Sorry for that I have edited ! Commented Oct 17, 2019 at 21:44

2 Answers 2

1

Here's a simple approach that should work:

cols_to_edit = paste0("var", 1:3)
result_list = lapply(list_of_dfs, function(x) {
  x[cols_to_edit][x[cols_to_edit] < 5] = NA
  return(x)
})

I assume your starting data is in a list called list_of_dfs, that the names of columns to edit are the same in all data frames, and that you can construct a character vector cols_to_edit with those names.

Sign up to request clarification or add additional context in comments.

3 Comments

My variable names are same in all datasets but it can change in future hence I have tried to use like macro. The variable name in future can change form variable3 to variable4 hence I wanted to use macro. when i tried you code : cols_to_edit = eval(sym(paste0("var", 1:3))) result = lapply(list_of_dfs, function(x) { x[cols_to_edit][x[cols_to_edit] < 5] = NA return(x) }) it gives error: Error in *tmp*[cols_to_edit] : object of type 'symbol' is not subsettable
Yeah, so my code works but when you change it to eval(sym()) it doesn't work. Why don't you just create string column names? Seems still modular, but much simpler. You can change cols_to_edit = paste0("var", 1:3) to cols_to_edit = paste0("var", 1:4) just as easily---except that the rest of the code works too.
But var1 can resembles variable1. I don’t want to hard code it. In future if var1 resembles some different name than it would be easy if it’s not hard coded. Hence I need to use eval
0

Here is a solution to the problem in the question.
First of all, create a test data set.

createData <- function(Total = 3){
  numcols <- Total + 1
  set.seed(1234)
  for(i in 1:8){
    tmp <- replicate(numcols, sample(10, 20, TRUE))
    tmp <- as.data.frame(tmp)
    names(tmp) <- paste0("var", seq_len(numcols))
    assign(paste0("Data", i), tmp, envir = .GlobalEnv)
  }
}

createData()

Now, the data transformation.
This is much easier if the many dataframes are in a "list".

df_list <- mget(ls(pattern = "^Data"))

I will present solutions, a base R solution and a tidyverse one. Note that both solutions will use function temp_1, written in base R only.

library(tidyverse)

temp_1 <- function(x, h){
  f <- function(v){
    is.na(v) <- v <= 5
    v
  }
  x[h] <- lapply(x[h], f)
  x
}

h <- grep("var[123]", names(df_list[[1]]), value = TRUE)

df_list1 <- lapply(df_list, temp_1, h)
df_list2 <- df_list %>% map(temp_1, h)

identical(df_list1, df_list2)
#[1] TRUE

1 Comment

Hello I am sorry but I forgot to mention I need to use vars as macro variables so that when I use eval it resolves to Variable1,Variable2 and Variable3. So can you help with that ? the variable h which stores the names wont work how you have mentioned in the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.