0

I am running a glmnet model in R using the caret package and doing repeated nested cross-validation with the nestedcv package. I need to include a "custom" interaction of a single one of my categorical features (which has values 0/1) with all other features (some of which are numerical, others are categorical and coded as 0/1).

See an example below, without interactions:

# Load packages
library(caret)
library(nestedcv)

# Check out data
head(mtcars)

# Select features:
features <- mtcars %>%
  select(cyl, disp, vs, am) %>%
  data.matrix()

# Define outcome column:
outcome <- mtcars %>%
  select(mpg) %>%
  data.matrix()

# Set model parameters:
myControl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 5)     

# Define tuning grid:
myGrid <- expand.grid(alpha = seq(0.1, 0.9, length = 10),
                      lambda = seq(0.1, 0.9, length = 10))
  
# Tuning both alpha and lambda:
set.seed(123, "L'Ecuyer-CMRG") # for reproducibility
model_ncv <- nestcv.train(
  x = features,
  y = outcome[, 1],
  method = "glmnet",
  outer_method = "cv",
  n_outer_folds = 5,
  trControl = myControl,
  tuneGrid = myGrid,
  metric = "RMSE"
)

Now say I want to run the same model with an added interaction of the variable "vs" with all other variables but no other interactions. How do I do that?

I am aware that in the standard caret train() function you can specify interactions using "formula" command (e.g., train(mpg ~ (cyl + disp + am)*vs, data = mpg, method = "glmnet", myControl = myControl)) but nestcv.train() requires x and y to be specified separately.

I assume I can create a new variable in my dataset that represents the interaction but I am not sure how to go about this in R. For example this tutorial shows it is possible to simply multiply the variables that are interacting but the example in it is with numeric/continuous variables only. Or is it ok to just multiply everything by the 0s/1s that represent each category?

I believe the model.matrix() function might help me here but because I don't know anything about design matrices I am afraid I would do it incorrectly.

Any help will be greatly appreciated.

6
  • You can create x from the formula using the model.matrix function. Commented Jun 11, 2024 at 17:40
  • Also, does nestcv.train add the intercept? You might need to add a column of ones to your x. Commented Jun 12, 2024 at 2:59
  • Thanks. I tried using the model.matrix function but I am not at all sure if I am doing it right: features <- data.matrix(data.frame(model.matrix(~ (cyl + disp + am)*vs, data = mtcars))[-1]). As for the intercept, when I leave it in (= omit the [-1] at the end), the model complains that “1 predictor(s) have var=0” and it returns the same result as without the intercept. Commented Jun 12, 2024 at 17:06
  • model.matrix returns a matrix. There is really no reason to convert to data.frame and then to matrix again. Also, you can exclude the intercept in the formula: model.matrix(~ (cyl + disp + am) * vs - 1, data = mtcars). Apparently, the intercept is handled by nestcv.train. So, no need to add it manually. Commented Jun 13, 2024 at 6:13
  • Thank you, that's a good point. Would you like to add your comment as an answer or should I? (Sorry, noob here) Commented Jun 21, 2024 at 10:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.