0

I have (what I think should be) a straight forward problem in R, and can't seem to get it to work. Hope you can help.

I have a data frame e.g.:

x  y  z
1  2  a
2  3  a
3  4  a
4  5  b
5  6  b
6  7  b      etc...

And I'm fitting a linear model (y ~ x) for each z value subset (e.g. a, b...) and extracting the gradient.

It works when I select 'a' using a with statement as such:

coef(with(subset(data.frame, z == "a"), {lm(y ~ x)
}))[2]

But my problem is that I have more than 1000 unique values in the Z column. So I tried to set up a loop (I know R users hate loops!) to do this for each value of z in turn and return the result in a data frame. Code is:

gradient.lm = NULL

unique.z <- as.matrix((unique(data.frame$z)))
count.z <- nrow(unique.z)

for (i in 1:count.z) {
  gradient.lm[i] = coef((with(subset(data.frame, z == [i]), {lm(y ~ z)
  })))[2]
}

But this is not working, and giving me the error code:

> for (i in 1:count.z) {
+   activity.lm[i] = coef((with(subset(data.frame, z == [i]), {lm(y ~ x)
Error: unexpected '[' in:
"for (i in 1:count.z) {
  activity.lm[i] = coef((with(subset(data.frame, z == ["
>   })))[2]
Error: unexpected '}' in "  }"
> }
Error: unexpected '}' in "}"

My guess was that it doesn't realise that there is an [i] within the with function.

I can't find a way of making this work, or think of another way of doing it. If you have any suggestions they would be hugely appreciated.

2
  • use z == unique.z[i] instead of z == [i] Commented Sep 9, 2015 at 13:45
  • (And it's generally bad practice to name variables like functions (data.frame)) Commented Sep 9, 2015 at 13:49

3 Answers 3

2

I'd strongly recommend a dplyr and broom package solution:

set.seed(44)

dt = data.frame(x = rnorm(40, 5, 5),
                y = rnorm(40, 3, 4),
                z = rep(c("a","b"), 20))

library(dplyr)
library(broom)

dt %>%
  group_by(z) %>%            # group by column z
  do(tidy(lm(y~x, data=.)))  # for each group create model using corresponding x and y values

# Source: local data frame [4 x 6]
# Groups: z [2]
# 
#        z        term    estimate std.error  statistic    p.value
#   (fctr)       (chr)       (dbl)     (dbl)      (dbl)      (dbl)
# 1      a (Intercept)  3.54448459 1.8162699  1.9515186 0.06673401
# 2      a           x -0.18140655 0.2260252 -0.8025944 0.43267918
# 3      b (Intercept)  1.69024601 1.1960922  1.4131402 0.17467413
# 4      b           x  0.02647677 0.1914492  0.1382966 0.89154143

You can extract any piece of information of the lm output you want.

Sign up to request clarification or add additional context in comments.

Comments

1

In base-R, getting you a named vector of only the gradients you're apparently interested in:

gradient.lm <- unlist(lapply(split(df,df$z),function(chunk){
  return(coef(lm(y~x, data=chunk))[[2]])
}))

1 Comment

Heroka, that seems to work perfect. Thank you so much!!
0

This seems to work:

unique_z = unique(df$z)
coef_vec = vector(mode = "list", length = length(unique_z))

coef_vec[1] = 
for (i in unique_z){
  coef_vec[i] = 
    coef(
      with(
        subset(df, z==i), 
      {lm(y~x)}))[2]
}

print(coef_vec)

Cleary coef_vec[i] corresponds to the z value in unique_z[i] so you have the coefficients matched up to their z values.

2 Comments

Your solution doesn't return or assign anything.
@Heroka you can easily do that no? I'll modify it to put it all in a vector if you like.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.