Creating a linear regression model for each group in a column

Question

I refer to this answer: https://stackoverflow.com/a/65076441/14436230

I am trying to predict the "Education" value for 2019 using past values for each year, using lm(Education ~ poly(TIME,2)).

However, I will have to apply this lm named function(TIME) to each "LOCATION", which I was able to create separate lm for each LOCATION in m.

Following the answer in the link attached, I was able to run my code until my_predict. When I run sapply , I get an error Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list"

Can someone advise me on my mistake? I will really appreciate any help.


linear_model <- function(TIME) lm(Education ~ poly(TIME,2), data=table2)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(TIME) predict(m,new_df)

sapply(m,my_predict)   #error here

EDIT:

I am now able to predict education values for each "LOCATION" for 2020 and 2021 as shown below.

linear_model <- function(x) lm(Education ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c ("2020.Education", "2021.Education"))
my_predict <- function(x) predict(x,new_df)
result <- sapply(m,my_predict)

However, I actually wish to do this for more Independent Variables (e.g. Education, GDP, Hoursworked, PPI etc.) as shown in my column header:

Can someone advise me on how do I create a loop for my code to create a dataframe with the predicted values? I have struggled for so many hours but failed to do so.

Why are you creating separate models for each location? Why not include them all in one model? — user2974951
– user2974951, Commented Oct 29, 2021 at 10:47
@user2974951 I am not able to apply one linear regression model on the entire dataset due to repeated time values, but I researched and found out I can use multi-value model? Not sure how it works though — user14436230
– user14436230, Commented Oct 29, 2021 at 11:10
That is not true, you can have repeated values... are you referring to repeated measurements? — user2974951
– user2974951, Commented Oct 29, 2021 at 11:21
@user2974951 When I tried running one model previously on the entire dataset, I was unable to run the linear regression as each time value has more than one IV value e.g. 2010 has more than 1 education value from each country — user14436230
– user14436230, Commented Oct 30, 2021 at 8:19

Cecilia López · Accepted Answer · 2021-11-02 09:04:37Z

0

You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.

For example, in the linear_model function you defined, if you were to use it alone you would write:

linear_model(data)

However, because you are using it inside the lapply function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model function to each of the data frames you obtain from split(table2,table2$LOCATION).

The same thing happens with my_predict.

Anyway, this should work for you:

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)

ANSWER TO THE EDIT

There are probably more efficient ways of looping the prediction, but here is my approach:

pred_data <- list()

for (i in 3:6){
   linear_model <- function(x) lm(x[,i] ~ TIME, x)
   m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
   new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
   my_predict <- function(x) predict(x,new_df)
   pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
 }

 pred_data <- melt(pred_data)
 pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))

First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4) you put the interval of columns you want a prediction from. The result pred_data is a list that you can transform into a data frame in different ways. With melt and pivot_wider you obtain a format similar to your original data.

edited Nov 2, 2021 at 9:04

answered Oct 29, 2021 at 10:56

Cecilia López

3871 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user14436230 Over a year ago

thank you so much! I am pretty new to R, and your explanation of how function(x) works makes sense.

user14436230 Over a year ago

hello! i edited my question as i now wish to apply this code to more than one independent variable! would u have any advice on how to do so?

Cecilia López Over a year ago

hi! I've added a way to loop your code, tell me if it works!

TarJae · Accepted Answer · 2021-10-29 10:20:04Z

0

Are you looking for such a solution?

library(tidyverse)
library(broom)
df %>% 
  mutate(LOCATION = as_factor(LOCATION)) %>% 
  group_by(LOCATION) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(Education ~ TIME, data = df) %>% 
      glance() %>% 
      add_column(LOCATION = unique(df$LOCATION), .before=1)
  })

  LOCATION r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>        <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 AUT         0.367         0.261   4.88     3.47    0.112     1  -22.9  51.8  52.0    143.            6     8
2 BEL         0.0225       -0.173   3.90     0.115   0.748     1  -18.3  42.6  42.4     76.0           5     7
3 CZE         0.0843       -0.0683  3.22     0.552   0.485     1  -19.6  45.1  45.3     62.2           6     8

answered Oct 29, 2021 at 10:20

TarJae

80.2k6 gold badges30 silver badges94 bronze badges

2 Comments

user14436230 Over a year ago

hi! i tried the code above but received an error: Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "function". I'm not sure why as the there is no function involved in the mutate.

TarJae Over a year ago

use library(dplyr). If you still get error. Then df %>% dplyr::mutate(....

Collectives™ on Stack Overflow

Creating a linear regression model for each group in a column

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related