Predict linear regression with multiple separate groups

Question

I would like to predict values from a linear regression from multiple groups in a single dataframe. I have found the following blogpost which ALMOST does everything I need: https://www.r-bloggers.com/2016/09/running-a-model-on-separate-groups/

However, I cannot combine this with the predict() function with a newdata. For one group, I use the following:

m <- lm(y ~ x, df)
new_df <- data.frame(x=c(5))
predict(m, new_df)

this gives me the predicted value for y at x=5.

How do I do this when I have multiple groups in my df? This is what I tried:

df %>%
    nest(-group) %>%
    mutate(fit = map(data, ~ lm(.$y ~ .$x)),
           results = map(fit, predict)) %>%
    unnest(results)

When I try to use results = map(fit, predict(new_df)), I only get an error. Is there a way how I can pass my value for x (in this case 5) into the code above?

Ideally, I would get a new data.frame with two columns, group and the predicted y-value.

This is a sample data.frame:

group   x   y
g1  1   2
g1  1.5 3
g1  2   4
g1  2.3 4.4
g1  3   6
g1  3.4 6.2
g1  4.11    7
g1  4.8 7.9
g1  5   8
g1  5.3 8.2
g2  2   5
g2  2.3 4
g2  4   2.2
g2  4.4 1.9
g2  7   0.3

EDIT:

Plotting the sample data using ggplot2, I get the following plot:

ggplot(df, aes(x,y,colour=group)) +
 geom_point() +
 stat_smooth(method="lm", se=FALSE)

Using the following code, I get the sought after predicted y-values:

predict(lm(y ~ x, df[df$group =="g1", ]), new_df)
       1 
8.180285 

predict(lm(y ~ x, df[df$group =="g2", ]), new_df)
       1 
1.732136

I would like to generate a new dataframe which should look something like this and contain the predicted y-value at x=5:

group   y_predict  
g1  8.180285  
g2  1.732136

As @Marcos Perez says below, this is a perfect case for splitting your dataframe into a list, and applying a lm function across the list elements. — Rich Pauloo
– Rich Pauloo, Commented Nov 30, 2020 at 17:01

G. Grothendieck · Accepted Answer · 2020-12-01 11:52:53Z

3

Using the input shown reproducibly in the Note and since we only need the fitted values we don't need to use nest but can just use mutate:

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(pred = fitted(lm(y ~ x))) %>%
  ungroup %>%
  select(group, pred)

giving:

# A tibble: 15 x 2
   group    pred
   <chr>   <dbl>
 1 g1     2.47  
 2 g1     3.19  
 3 g1     3.90  
 4 g1     4.33  
 5 g1     5.33  
 6 g1     5.90  
 7 g1     6.91  
 8 g1     7.89  
 9 g1     8.18  
10 g1     8.61  
11 g2     4.41  
12 g2     4.15  
13 g2     2.63  
14 g2     2.27  
15 g2    -0.0563

This could also be done like this:

library(dplyr)

df %>%
  mutate(pred = fitted(lm(y ~ x*group + 0, df))) %>%
  select(group, pred)

or like this using base R only:

transform(df, pred = fitted(lm(y ~ x*group + 0, df)))[c("group", "pred")]

or using lmList from nlme (which comes with R so it does not have to be installed):

library(dplyr)
library(nlme)

df %>%
  mutate(pred = fitted(lmList(y ~ x | group, df))) %>%
  select(group, pred)

or using lmList without dplyr:

library(nlme)

transform(df, pred = fitted(lmList(y ~ x | group, df)))[c("group", "pred")]

Note

Lines <- "
group   x   y
g1  1   2
g1  1.5 3
g1  2   4
g1  2.3 4.4
g1  3   6
g1  3.4 6.2
g1  4.11    7
g1  4.8 7.9
g1  5   8
g1  5.3 8.2
g2  2   5
g2  2.3 4
g2  4   2.2
g2  4.4 1.9
g2  7   0.3"
df <- read.table(text = Lines, header = TRUE)

Added

Regarding comment this code produces the prediction for x = 5 by group:

df %>%
  group_by(group) %>%
  summarize(pred = predict(lm(y ~ x), list(x = 5)), .groups = "drop") %>%
  select(group, pred)
## # A tibble: 2 x 2
##   group  pred
##   <chr> <dbl>
## 1 g1     8.18
## 2 g2     1.73

edited Dec 1, 2020 at 11:52

answered Nov 26, 2020 at 13:22

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Servus Over a year ago

I don't really get the output that you create. What are the values in "predict"? I would just like to get one prediction at x=5; i.e.: predict(lm(y ~ x, df[df$group =="g1", ]), new_df) This gives 8.180285 which is the value I am looking for.

Servus Over a year ago

I guess the values in "predict" are the predicted values for the "x" values in my df? But how do I get just one prediction for x=5, even if I don't have a value at x=5?

G. Grothendieck Over a year ago

See secton Added which has been added to the end.

Marcos Pérez · Accepted Answer · 2020-11-30 15:31:33Z

1

+50

This is a perfect case to use lapply function. Try this:

linear_model <- function(x) lm(y ~ x, x)
m <- lapply(split(df,df$group),linear_model)

Now, you have a list of linear models. Let's use it to predict y-value of your new_df for all models:

new_df <- data.frame(x=c(5))
my_predict <- function(m) predict(m,new_df)
sapply(m,my_predict)

Output:

#     g1.1     g2.1 
# 8.180285 1.732136

The output is numeric class with names.

answered Nov 30, 2020 at 15:31

Marcos Pérez

1,2505 silver badges7 bronze badges

3 Comments

Servus Over a year ago

Thank you! This is exactly what I wanted to do. Is there a way to get rid of the ".1" from the output?

Marcos Pérez Over a year ago

Sure, ".1" is becouse of new_df row name. So you have to rename the new_df row like this new_df <- data.frame (x = c (5), row.names = c ("")), but it's better not to do it because when new_df has more than one row the output is matrix class and column names are "g1" and "g2". Try: new_df <- data.frame (x = c (5:6)).

Servus Over a year ago

Is there also a way to get m in the form of a data.frame? as.data.frame(m) gives the following error: Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ‘"lm"’ to a data.frame

Laurent Bergé · Accepted Answer · 2020-12-14 14:27:12Z

What you are describing is an estimation with varying intercepts and slopes.

With `lm`

You can do that directly using lm:

base = iris
names(base) = c("y", "x1", "x2", "x3", "species")

newdata = data.frame(x1 = 5, species = c("setosa", "versicolor", "virginica"))
res_1 = lm(y ~ species/x1, base)
newdata$y = predict(res_1, newdata)
newdata
#>   x1    species        y
#> 1  5     setosa 6.091450
#> 2  5 versicolor 7.865123
#> 3  5  virginica 8.414509

The shortcut species/x1 means species + species:x1, i.e. the factor variable and the interaction between the factor and the variable. Thus there will be one intercept and one coefficient associated to x1 for each group (here species).

Then the predict method can be used as usual which will lead to the requested result. This is done without needing loops nor lapply.

Alternative method

An alternative is to use specialized packages to estimate that kind of models, like for instance fixest. Since it is specialized in fixed-effects estimations, the run time will be substantially lower for large data sets.

library(fixest)

# Using variables with varying slopes
res_2 = feols(y ~ 1 | species[x1], base)
predict(res_2, newdata)
#>        1        2        3 
#> 6.091450 7.865123 8.414509

Some explanations:

Your group is the variable species here.
feols is the equivalent of lm but you can define fixed-effects after the pipe.
species[x1] means species fixed-effects (i.e. one intercept per species) + x1 having one coefficient per species (varying slopes).

Collectives™ on Stack Overflow

Predict linear regression with multiple separate groups

3 Answers 3

Note

Added

3 Comments

3 Comments

With `lm`

Alternative method

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Note

Added

3 Comments

3 Comments

With lm

Alternative method

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

With `lm`