0

I'm computing the model outputs for a linear regression for a dependent variable with 45 different id values. How can I use tidy (dplyr, apply, etc.) code to accomplish this?

I have a dataset with three variables data = c(id, distance, actPct) such that id == 1:45; -10 <= distance <= 10; 0 <= actsPct <= 1.

I need to run a regression, model0n, on each value of id, such that model0n has out put in a new tibble/df. I have completed it for a single regression:

model01 <- data %>% 
filter(id == 1) %>%
filter(distance < 1) %>%
filter(distance > -4)
model01 <- lm(data = model01, actPct~distance)

Example Data

set.seed(42)
id <- as.tibble(sample(1:45,100,replace = T))
distance <- as.tibble(sample(-4:4,100,replace = T))
actPct <- as.tibble(runif(100, min=0, max=1))
data01 <- bind_cols(id=id, distance=distance, actPct=actPct)
attr(data01, "col.names") <- c("id", "distance", "actPct")

I expect a new tibble or dataframe that has model01:model45 so I can put all of the regression outputs into a single table.

3
  • apply is generally slow from my experience. Commented Jan 14, 2019 at 15:58
  • 1
    @NelsonGon, thank you, I had not experience a slow apply function before. Is there something else you prefer? Everything I have read indicated apply was faster than a for loop, so I thought I would give it a shot. Commented Jan 15, 2019 at 16:55
  • Could you see if this question could help? Seems similar to yours stackoverflow.com/questions/53968490/… Maybe use other members of the apply family but not apply(personal opinion). Commented Jan 15, 2019 at 16:57

1 Answer 1

2

You can use group_by, nest and mutate with map from the tidyverse to accomplish this:

data01 %>% 
  group_by(id) %>% 
  nest() %>% 
  mutate(models = map(data, ~ lm(actPct ~ distance, data = .x)))

# A tibble: 41 x 3
#       id data             models  
#    <int> <list>           <list>  
#  1    42 <tibble [3 x 2]> <S3: lm>
#  2    43 <tibble [4 x 2]> <S3: lm>
#  3    13 <tibble [2 x 2]> <S3: lm>
#  4    38 <tibble [4 x 2]> <S3: lm>
#  5    29 <tibble [2 x 2]> <S3: lm>
#  6    24 <tibble [5 x 2]> <S3: lm>
#  7    34 <tibble [5 x 2]> <S3: lm>
#  8     7 <tibble [3 x 2]> <S3: lm>
#  9    30 <tibble [2 x 2]> <S3: lm>
# 10    32 <tibble [2 x 2]> <S3: lm>
# ... with 31 more rows

See also the chapter in R for R for Data Science about many models: https://r4ds.had.co.nz/many-models.html

Data

set.seed(42)
id <- sample(1:45, 100, replace = T)
distance <- sample(-4:4, 100, replace = T)
actPct <- runif(100, min = 0, max = 1)
data01 <- tibble(id = id, distance = distance, actPct = actPct)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.