2

I have a data frame containing the specification for a set of regression models (regress_grid) with a column for different aspects of the model. I then use dplyr::rowwise() to estimate a model for each row of regress_grid using the analysis dataset (mtcars). This is adapted from the dataless grids approach in Tim Tiefenbach's blog post.

A minimal example of what i'm attempting is below:

library("tibble")
library("rlang")
library("dplyr")

# Regression specification for 2 models with different explanatory variables are samples, specified based on variables in the analysis dataset.)including columns with filter expressions (strat1 and strat2) 
regress_grid = tribble(
  ~strat1,          ~strat2,        ~term_labels,
  expr(carb != 1), expr(cyl != 4), c("wt","qsec") ,
  expr(carb != 1), TRUE,           c("wt") )
regress_grid


# Use rowwise to add a mod column containing the lm object.
regress_grid1 = regress_grid |>
  dplyr::rowwise() |>
  dplyr::mutate(mod = list(lm(stats::reformulate(termlabels = term_labels,
                                                               response = "mpg"),
                                            data = filter(mtcars, 
                                                          eval(strat1), 
                                                          eval(strat2)))))

I don't seem able to make this work when I wrap the code in a function and try and make it flexible enough to take an unknown number of columns of regress_grid containing filter expressions. I want to specify the columns in regress_grid containing the filter expressions as a list and use list(TRUE) as the default as a way to not filter any observations. The default successfully runs with no filtering, but if I add a list of columns from regress_grid containing filter expressions (strat1 and strat2) I don't seem able to find a way to make this work.


# Function 
regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){

  reg_grid = reg_grid |>
    dplyr::rowwise() |>
    dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
                                                                     response = "mpg"),
                                                  data = filter(data, !!!filters)))))
  return(reg_grid)
}

# This works when no filter expressions are specifed and the default list(TRUE) in used.
regress_grid2 = regress_func(reg_grid = regress_grid,
                  termlabels = term_labels,
                  data = mtcars)

# I can't work out how to specify the options for filtering within a function                  
regress_grid2 = regress_func(reg_grid = regress_grid,
                  termlabels = term_labels,
                  filters = list(strat1, strat2),
                  data = mtcars) 

In its current form this results in an Error: object 'strat1' not found. I have tried various combinations of eval, expr, enexpr and enquo but I seem to go round in circles. I have attempted to digest the rlang and meta-programming documentation, but I don't able to have wrap my head around it.

Ideally, I wouldn't use ... here as I was planning to use these for something else in my real use case. I don't feel strongly whether I specify the columns containing filter expressions as list(strat1, strat2) or list("strat1", "strat2"). I have attempted to make my question specific but if there are other ways to approach this I would be very interested.

I have found several previous questions about dynamically specifying filter arguments here, here, here, and here but none quite answer my question.

1 Answer 1

3

You kind of need double injection. Here's a helper function that can turn call to a list list like list(a,b) into list(!!a, !!b) so you can inject those expressions into the list itself.

 splicelist <- function(x) {
  stopifnot(rlang::quo_get_expr(x)[[1]] == as.name("list"))
  Map(function(x) bquote(!!.(x)), as.list(rlang::quo_get_expr(x))[-1])
}

and then update your regression function to get the unevaulated filters and the process them and inject them

regress_func = function(reg_grid, termlabels, data, filters = list(TRUE)){
  filters <- splicelist(rlang::enquo(filters))
  reg_grid = reg_grid |>
    dplyr::rowwise() |>
    dplyr::mutate(mod = list(rlang::inject(lm(stats::reformulate(termlabels = {{termlabels}},
                                                                 response = "mpg"),
                                              data = filter(data, !!!filters)))))
  return(reg_grid)
}

Which should then work with both of your examples

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much indeed. I would never have thought to combine bquote .() with !!.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.