0

I have a dataframe that has 1000 rows and 10 columns. First column is my y variable and rest of the columns are x variables. I would like to fit 10 different linear regression on 10 different subsets of data. For example: row1:100 first subset, row101:200 second subset etc...I would like to store output of each linear model (slope values) in a row of a new dataset. Is there an easy way to do this? I tried below:

for (i in 1:10 ) {
  model_var[i] = lm(y[(100*(i-1)+1:100*i]~.,var) 
  # var is my dataframe that has all the data
  #model_var[i] will store linear models
}

But I got an error. It seems that R doesn't allow to fit linear model to subset of a data.

3 Answers 3

2

A slightly more elegant solution based on @nograpes's answer:

Make up some data:

set.seed(101)
var <- data.frame(y=1:1000,matrix(runif(10000),nrow=1000))

Create a splitting variable (alternately see ggplot2::cut_number)

cutvar <- (seq(nrow(var))-1) %/% 100

Split the data and use lapply:

mList <- lapply(split(var,cutvar),lm,formula=y~.)

If you just want the coefficients then

t(sapply(mList,coef))

should extract them for you.

Sign up to request clarification or add additional context in comments.

1 Comment

If you used by it would do the splitting work for you: by(var,rep(1:10,each=10),lm,formula=y~.) or using your cutvar by(var,cutvar,lm,formula=y~.)
2

Another way to do is using rollapply from zoo package.

Using slightly different data to those of Ben Bolker and applying rollapply you can get it.

set.seed(1)
var <- data.frame(matrix(runif(10000),nrow=1000))
colnames(var) <- c("y", paste0("x", 1:9))

Coef <- rollapply(var, 
          width = 100, by=100, 
          FUN = function(z) {
            coef(lm(y~., data=as.data.frame(z)))
          },
          by.column = FALSE, align = "right") 

round(Coef, 3) # and here's the coefficients corresponding to the 10 regressions
      (Intercept)     x1     x2     x3     x4     x5     x6     x7     x8     x9
 [1,]       0.416 -0.253  0.093 -0.047  0.039  0.081  0.053 -0.022  0.084  0.006
 [2,]       0.656  0.144 -0.209 -0.150 -0.066  0.084  0.018 -0.114 -0.016  0.073
 [3,]       0.311 -0.134  0.006  0.047  0.036  0.020  0.082  0.172  0.211 -0.090
 [4,]       0.720 -0.110  0.094 -0.058 -0.018 -0.256 -0.058  0.074 -0.042  0.010
 [5,]       0.510  0.052  0.019 -0.193 -0.045  0.114 -0.093  0.044  0.059  0.051
 [6,]       1.044 -0.037 -0.300 -0.180  0.148  0.018 -0.187 -0.128 -0.182 -0.154
 [7,]       0.558  0.027 -0.231 -0.074  0.065  0.192 -0.022 -0.105 -0.002  0.046
 [8,]       0.496  0.156 -0.129 -0.061  0.025  0.028 -0.010  0.097 -0.031 -0.090
 [9,]       0.435  0.140  0.138 -0.170 -0.085 -0.069 -0.077 -0.056  0.190  0.105
[10,]       0.282  0.078  0.014 -0.005  0.110  0.149  0.001  0.175 -0.017 -0.033

Comments

2

You need to subset both the y and the x variables. A simple way to do this would be to subset the var data.frame directly:

model_var<-list()
for (i in 1:10 ) 
  model_var[[i]] = lm(y~.,var[(100*(i-1)+1:100*i,]) 

3 Comments

I am getting below error "object 'model_var' not found" . how to store lm models in an array object?
model_var <- list(); for (i in 1:10) { model_var[[i]] = ... } (this is a list, which is probably the best way to store lm models. What do you mean exactly by "an array object"?
In my original question I am referring to model_var[i] as an array object because i will be storing my LM models in them...that is what I meant by "an array object". I am using an array object because that enables me to use a for loop

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.