Running Multiple Linear Regression Models in for-Loop

Question

The logic is similar to the content-based recommender,

content	undesirable	desirable	user_1	user_10
1	3.00	2.77	0.11	NA
...
5000	2.50	2.11	NA	0.12

I need to run the model for undesirable and desirable as independent values and each user as the dependent value, thus I need run 10 times to fit the model and predict each user's NA value.

This is the code that I hard coding, but I wonder how to use for loop, I just searched for several methods but they do not work for me...

the data as 'test'

hard code

#fit model
fit_1 = lm(user_1 ~ undesirable + desirable, data = test)
...
fit_10 = lm(user_10 ~ undesirable + desirable, data = test)

#prediction
u_1_na = test[is.na(test$user_1), c('user_1', 'undesirable', 'desirable')]
result1 = predict(fit_1, newdata = u_1_na)
which(result1 == max(result1))
max(result1)
...
u_10_na = test[is.na(test$user_10), c('user_10', 'undesirable', 'desirable')]
result10 = predict(fit_10, newdata = u_10_na)
which(result10 == max(result10))
max(result10)

#make to csv file
apply each max predict value to csv.

this is what I try for now(for loop)

mod_summaries <- list() 

for(i in 1:10) {                 
  
  predictors_i <- colnames(data)[1:10]   
  mod_summaries[[i - 1]] <- summary(     
    lm(predictors_i ~ ., test[ , c("undesirable", 'desirable')]))
  
}

Create the formulas as string formulas <- paste0("user_", 1:10, " ~ undesirable + desirable" use them to iterate and create the regressions models <- lapply(formulas, \(x)lm(as.formula(x), data = test)) — Oliver
– Oliver, Commented Nov 14, 2022 at 15:32
R is an index 1 language, no need to subtract 1 from i, i - 1, as index 0 languages like Python require. Simply refer to i. — M.Viking
– M.Viking, Commented Nov 14, 2022 at 15:42
@M.Viking “Like index 0 like Python require” — err this isn’t really required in Python either since virtually every iteration starts at 0, not at 1. — Konrad Rudolph
– Konrad Rudolph, Commented Nov 14, 2022 at 15:50
@Oliver, Yeah, this is work for me, but I still need to use the model to predict the NA for each user.. — bergpot
– bergpot, Commented Nov 14, 2022 at 16:39

M.Viking · Accepted Answer · 2022-11-14 16:26:02Z

1

An apply method:

mod_summaries_lapply <-
  lapply(
    colnames(mtcars),
    FUN = function(x)
      summary(lm(reformulate(".", response = x), data = mtcars))
  )

A for loop method to make linear models for each column. The key is the reformulate() function, which creates the formula from strings. In the question, the function is made of a string and results in error invalid term in model formula. The string needs to be evaluated with eval() . This example uses the mtcars dataset.

mod_summaries <- list() 
for(i in 1:11) {                 
  predictors_i <- colnames(mtcars)[i]   
  mod_summaries[[i]] <- summary(lm(reformulate(".", response = predictors_i), data=mtcars))
  #summary(lm(reformulate(". -1", response = predictors_i), data=mtcars))  # -1 to exclude intercept
  #summary(lm(as.formula(paste(predictors_i, "~ .")), data=mtcars)) # a "paste as formula" method
}

edited Nov 14, 2022 at 16:26

answered Nov 14, 2022 at 15:58

M.Viking

5,4804 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Nico · Accepted Answer · 2022-11-14 15:41:14Z

0

You could use the function as.formula together with the paste function to create your formula. Following is an example

formula_lm <- as.formula(
    paste(response_var, 
          paste(expl_var, collapse = " + "), 
          sep = " ~ "))

This implies that you have more than one explanatory variable (separated in the paste with +). If you only have one, omit the second paste.

With the created formula, you can use the lm funciton like this:

lm(formula_lm, data)

Edit: the vector expl_var would in your case include the undesirable and desirable variable.

answered Nov 14, 2022 at 15:41

Nico

5066 silver badges14 bronze badges

2 Comments

Konrad Rudolph Over a year ago

While this is possible and fairly common, it's actually quite convoluted. R has a conceptually simpler and much more elegant way of dynamically creating formulas, namely by interpolating a variable into an unevaluated expression, e.g.: eval(bquote(.(response_var) ~ .)).

Nico Over a year ago

Thanks Konrad, learned something new! Did not know that the function bquote existed. Altough I think that the readability suffers a bit.

Limey · Accepted Answer · 2022-11-14 16:02:40Z

0

Avoid the loop. Make your data tidy. Something like:

library(tidyverse)

test %>%
  select(-content) %>%
  pivot_longer(
    starts_with("user"),
    names_to="user",
    values_to="value"
  ) %>%
  group_by(user) %>%
  group_map(
    function(.x, .y) {
      summary(lm(user ~ ., data=.x))
    }
  )

Untested code since your example is not reproducible.

answered Nov 14, 2022 at 16:02

Limey

12.9k2 gold badges17 silver badges42 bronze badges

Collectives™ on Stack Overflow

Running Multiple Linear Regression Models in for-Loop

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related