Linear model within a loop (loop used to subset data)

Question

I have (what I think should be) a straight forward problem in R, and can't seem to get it to work. Hope you can help.

I have a data frame e.g.:

x  y  z
1  2  a
2  3  a
3  4  a
4  5  b
5  6  b
6  7  b      etc...

And I'm fitting a linear model (y ~ x) for each z value subset (e.g. a, b...) and extracting the gradient.

It works when I select 'a' using a with statement as such:

coef(with(subset(data.frame, z == "a"), {lm(y ~ x)
}))[2]

But my problem is that I have more than 1000 unique values in the Z column. So I tried to set up a loop (I know R users hate loops!) to do this for each value of z in turn and return the result in a data frame. Code is:

gradient.lm = NULL

unique.z <- as.matrix((unique(data.frame$z)))
count.z <- nrow(unique.z)

for (i in 1:count.z) {
  gradient.lm[i] = coef((with(subset(data.frame, z == [i]), {lm(y ~ z)
  })))[2]
}

But this is not working, and giving me the error code:

> for (i in 1:count.z) {
+   activity.lm[i] = coef((with(subset(data.frame, z == [i]), {lm(y ~ x)
Error: unexpected '[' in:
"for (i in 1:count.z) {
  activity.lm[i] = coef((with(subset(data.frame, z == ["
>   })))[2]
Error: unexpected '}' in "  }"
> }
Error: unexpected '}' in "}"

My guess was that it doesn't realise that there is an [i] within the with function.

I can't find a way of making this work, or think of another way of doing it. If you have any suggestions they would be hugely appreciated.

(And it's generally bad practice to name variables like functions (data.frame)) — Heroka
– Heroka, Commented Sep 9, 2015 at 13:49

AntoniosK · Accepted Answer · 2015-09-09 13:52:04Z

I'd strongly recommend a dplyr and broom package solution:

set.seed(44)

dt = data.frame(x = rnorm(40, 5, 5),
                y = rnorm(40, 3, 4),
                z = rep(c("a","b"), 20))

library(dplyr)
library(broom)

dt %>%
  group_by(z) %>%            # group by column z
  do(tidy(lm(y~x, data=.)))  # for each group create model using corresponding x and y values

# Source: local data frame [4 x 6]
# Groups: z [2]
# 
#        z        term    estimate std.error  statistic    p.value
#   (fctr)       (chr)       (dbl)     (dbl)      (dbl)      (dbl)
# 1      a (Intercept)  3.54448459 1.8162699  1.9515186 0.06673401
# 2      a           x -0.18140655 0.2260252 -0.8025944 0.43267918
# 3      b (Intercept)  1.69024601 1.1960922  1.4131402 0.17467413
# 4      b           x  0.02647677 0.1914492  0.1382966 0.89154143

You can extract any piece of information of the lm output you want.

Heroka · Accepted Answer · 2015-09-09 13:53:34Z

1

In base-R, getting you a named vector of only the gradients you're apparently interested in:

gradient.lm <- unlist(lapply(split(df,df$z),function(chunk){
  return(coef(lm(y~x, data=chunk))[[2]])
}))

answered Sep 9, 2015 at 13:53

Heroka

13.2k2 gold badges30 silver badges38 bronze badges

1 Comment

George Koudis Over a year ago

Heroka, that seems to work perfect. Thank you so much!!

sunny · Accepted Answer · 2015-09-09 13:56:39Z

0

This seems to work:

unique_z = unique(df$z)
coef_vec = vector(mode = "list", length = length(unique_z))

coef_vec[1] = 
for (i in unique_z){
  coef_vec[i] = 
    coef(
      with(
        subset(df, z==i), 
      {lm(y~x)}))[2]
}

print(coef_vec)

Cleary coef_vec[i] corresponds to the z value in unique_z[i] so you have the coefficients matched up to their z values.

edited Sep 9, 2015 at 13:56

answered Sep 9, 2015 at 13:52

sunny

3,9015 gold badges35 silver badges63 bronze badges

2 Comments

Heroka Over a year ago

Your solution doesn't return or assign anything.

sunny Over a year ago

@Heroka you can easily do that no? I'll modify it to put it all in a vector if you like.

Collectives™ on Stack Overflow

Linear model within a loop (loop used to subset data)

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related