1

I have a dataframe e.g.

df_reprex <- data.frame(id = rep(paste0("S",round(runif(100, 1000000, 9999999),0)), each=10),
                        date = rep(seq.Date(today(), by=-7, length.out = 10), 100),
                        var1 = runif(1000, 10, 20),
                        var2 = runif(1000, 20, 50),
                        var3 = runif(1000, 2, 5),
                        var250 = runif(1000, 100, 200),
                        var1_baseline = rep(runif(100, 5, 10), each=10),
                        var2_baseline = rep(runif(100, 50, 80), each=10),
                        var3_baseline = rep(runif(100, 1, 3), each=10),
                        var250_baseline = rep(runif(100, 20, 70), each=10))

I want to write a function containing a for loop that for each row in the dataframe will subtract every "_baseline" column from the non-baseline column with the same name.

I have created a script that automatically creates a character string containing the code I would like to run:

df <- df_reprex

# get only numeric columns
df_num <- df %>% dplyr::select_if(., is.numeric)

# create a version with no baselines
df_nobaselines <- df_num %>% select(-contains("baseline"))

#extract names of non-baseline columns
numeric_cols <- names(df_nobaselines)

#initialise empty string  
mutatestring <- ""

#write loop to fill in string:
for (colname in numeric_cols) {
  
 mutatestring <- paste(mutatestring, ",", paste0(colname, "_change"), "=", colname, "-", paste0(colname, "_baseline")) 
 
 # df_num <- df_num %>%
 #   mutate(paste0(col, "_change") = col - paste0(col, "_baseline"))
  
}

mutatestring <- substr(mutatestring, 4, 9999999) # remove stuff at start (I know it's inefficient)
mutatestring2 <- paste("df %>% mutate(", mutatestring, ")") # add mutate call

but when I try to call "mutatestring2" it just prints the character string e.g.:

[1] "df %>% mutate( var1_change = var1 - var1_baseline , var2_change = var2 - var2_baseline , var3_change = var3 - var3_baseline , var250_change = var250 - var250_baseline )"

I thought that this part would be relatively easy and I'm sure I've missed something obvious, but I just can't get the text inside that string to run!

I've tried various slightly ridiculous methods but none of them return the desired output (i.e. the result returned by the character string if it was entered into the console as a command):

call(mutatestring2)
eval(mutatestring2)
parse(mutatestring2)
str2lang(mutatestring2)
mget(mutatestring2)

diff_func <- function() {mutatestring2}
diff_func1 <- function() {
  a <-mutatestring2
  return(a)}
diff_func2 <- function() {str2lang(mutatestring2)}
diff_func3 <- function() {eval(mutatestring2)}
diff_func4 <- function() {parse(mutatestring2)}
diff_func5 <- function() {call(mutatestring2)}

diff_func()
diff_func1()
diff_func2()
diff_func3()
diff_func4()
diff_func5()

I'm sure there must be a very straightforward way of doing this, but I just can't work it out!

How do I convert a character string to something that I can run or pass to a magrittr pipe?

1
  • 1
    You need to use the text parameter in parse, theneval the result. For example, you can do eval(parse(text = "print(5)")). However, it's important to note that this is normally a very bad idea, and there is usually a more sensible alternative. Commented Oct 23, 2020 at 11:55

1 Answer 1

3

You need to use the text parameter in parse, then eval the result. For example, you can do:

eval(parse(text = "print(5)"))
#> [1] 5

However, using eval(parse()) is normally a very bad idea, and there is usually a more sensible alternative.

In your case you can do this without resorting to eval(parse()), for example in base R you could subtract all the appropriate variables from each other like this:

baseline <- grep("_baseline$", names(df_reprex), value = TRUE)

non_baseline <- gsub("_baseline", "", baseline)

df_new <- cbind(df_reprex, as.data.frame(setNames(mapply(
                  function(i, j) df_reprex[[i]] - df_reprex[[j]], 
                  baseline, non_baseline, SIMPLIFY = FALSE), 
                paste0(non_baseline, "_corrected"))))

Or if you want to keep the whole thing in a single pipe without storing intermediate variables, you could do:

mapply(function(i, j) df_reprex[[i]] - df_reprex[[j]],
       grep("_baseline$", names(df_reprex), value = TRUE), 
       gsub("_baseline", "", grep("_baseline$", names(df_reprex), value = TRUE)), 
       SIMPLIFY = FALSE) %>%
  setNames(gsub("_baseline", "_corrected", 
                grep("_baseline$", names(df_reprex), value = TRUE))) %>%
  as.data.frame() %>%
  {cbind(df_reprex, .)}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.