1

I would like to loop over various regressions referencing different data subsets, however I'm unable to appropriately call different subsets. For example:

dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) 
x.list <- list(dat$x1,dat$x2,dat$x3)  
dat1 <- dat[-9,] 

fit <- list()
for(i in 1:length(x.list)){ fit[[i]] <- summary(lm(y ~ x.list[[i]], data = dat))}         
for(i in 1:length(x.list)){ fit[[i]] <- summary(lm(y ~ x.list[[i]], data = dat1))}         

Is there a way to call in "dat1" such that it subsets the other variables accordingly? Thanks for any recs.

3 Answers 3

4

I'm not sure it makes sense to copy your covariates into a new list like that. Here's a way to loop over columns and to dynamically build formulas

dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) 
dat1 <- dat[-9,] 
#x.list not used

fit <- list()
for(i in c("x1","x2","x3")){ fit[[i]] <- summary(lm(reformulate(i,"y"), data = dat))}   
for(i in c("x1","x2","x3")){ fit[[i]] <- summary(lm(reformulate(i,"y"), data = dat1))}   
Sign up to request clarification or add additional context in comments.

Comments

0

How about this?

dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) 
mods <- lapply(list(y ~ x1, y ~ x2, y ~ x3), lm, data = dat1)

If you have lots of predictors, create the formulas something like this:

lapply(paste('y ~ ', 'x', 1:10, sep = ''), as.formula)

If your data was in long format, it would be similarly simple to do by using lapply on a split data.frame.

dat <- data.frame(y = rnorm(30), x = rnorm(30), f = rep(1:3, each = 10))
lapply(split(dat, dat$f), function(x) lm(y ~ x, data = x)) 

Comments

0

Sorry being late - but have you tried to apply the data.table solution similar to yours in:

R data.table loop subset by factor and do lm()

I have just applied the links solution by altering your data which should illustrate how I understood your question:

set.seed(1)

df <- data.frame(x1 = letters[1:3], 
                 x2 = sample(c("a","b","c"), 30, replace = TRUE),
                 x3 = sample(c(20:50), 30, replace = TRUE),   
                 y = sample(c(20:50), 30, replace = TRUE))
dt <- data.table(df,key="x1")

fits <- lapply(unique(dt$x1),
               function(z)lm(y~x2+x3, data=dt[J(z),], y=T))

fit <-  dt[, lm(y ~ x2 + x3)]

# Using id as a "by" variable you get a model per id
coef_tbl <- dt[, as.list(coef(lm(y ~ x2 + x3))), by=x1]
# coefficients
sapply(fits,coef)

anova_tbl = dt[, as.list(anova(lm(y ~ x2 + x3))), by=x1]
row_names = dt[, row.names(anova(lm(y ~ x2 + x3))), by=x1]
anova_tbl[, variable := row_names$V1]

It extends your solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.