0

I refer to this answer: https://stackoverflow.com/a/65076441/14436230

I am trying to predict the "Education" value for 2019 using past values for each year, using lm(Education ~ poly(TIME,2)).

However, I will have to apply this lm named function(TIME) to each "LOCATION", which I was able to create separate lm for each LOCATION in m.

Following the answer in the link attached, I was able to run my code until my_predict. When I run sapply , I get an error Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list"

Can someone advise me on my mistake? I will really appreciate any help.

enter image description here


linear_model <- function(TIME) lm(Education ~ poly(TIME,2), data=table2)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(TIME) predict(m,new_df)

sapply(m,my_predict)   #error here 

EDIT:

I am now able to predict education values for each "LOCATION" for 2020 and 2021 as shown below.

linear_model <- function(x) lm(Education ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c ("2020.Education", "2021.Education"))
my_predict <- function(x) predict(x,new_df)
result <- sapply(m,my_predict)

enter image description here

However, I actually wish to do this for more Independent Variables (e.g. Education, GDP, Hoursworked, PPI etc.) as shown in my column header:

enter image description here

Can someone advise me on how do I create a loop for my code to create a dataframe with the predicted values? I have struggled for so many hours but failed to do so.

4
  • Why are you creating separate models for each location? Why not include them all in one model? Commented Oct 29, 2021 at 10:47
  • @user2974951 I am not able to apply one linear regression model on the entire dataset due to repeated time values, but I researched and found out I can use multi-value model? Not sure how it works though Commented Oct 29, 2021 at 11:10
  • That is not true, you can have repeated values... are you referring to repeated measurements? Commented Oct 29, 2021 at 11:21
  • @user2974951 When I tried running one model previously on the entire dataset, I was unable to run the linear regression as each time value has more than one IV value e.g. 2010 has more than 1 education value from each country Commented Oct 30, 2021 at 8:19

2 Answers 2

0

You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.

For example, in the linear_model function you defined, if you were to use it alone you would write:

linear_model(data)

However, because you are using it inside the lapply function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model function to each of the data frames you obtain from split(table2,table2$LOCATION).

The same thing happens with my_predict.

Anyway, this should work for you:

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)  

ANSWER TO THE EDIT

There are probably more efficient ways of looping the prediction, but here is my approach:

pred_data <- list()

for (i in 3:6){
   linear_model <- function(x) lm(x[,i] ~ TIME, x)
   m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
   new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
   my_predict <- function(x) predict(x,new_df)
   pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
 }

 pred_data <- melt(pred_data)
 pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))

First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4) you put the interval of columns you want a prediction from. The result pred_data is a list that you can transform into a data frame in different ways. With melt and pivot_wider you obtain a format similar to your original data.

Sign up to request clarification or add additional context in comments.

3 Comments

thank you so much! I am pretty new to R, and your explanation of how function(x) works makes sense.
hello! i edited my question as i now wish to apply this code to more than one independent variable! would u have any advice on how to do so?
hi! I've added a way to loop your code, tell me if it works!
0

Are you looking for such a solution?

library(tidyverse)
library(broom)
df %>% 
  mutate(LOCATION = as_factor(LOCATION)) %>% 
  group_by(LOCATION) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(Education ~ TIME, data = df) %>% 
      glance() %>% 
      add_column(LOCATION = unique(df$LOCATION), .before=1)
  })
  LOCATION r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>        <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 AUT         0.367         0.261   4.88     3.47    0.112     1  -22.9  51.8  52.0    143.            6     8
2 BEL         0.0225       -0.173   3.90     0.115   0.748     1  -18.3  42.6  42.4     76.0           5     7
3 CZE         0.0843       -0.0683  3.22     0.552   0.485     1  -19.6  45.1  45.3     62.2           6     8

2 Comments

hi! i tried the code above but received an error: Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "function". I'm not sure why as the there is no function involved in the mutate.
use library(dplyr). If you still get error. Then df %>% dplyr::mutate(....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.