R - How can I use the apply functions instead of iterating?

Question

Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.

When there is a missing observation (NA) , the entire row is not used for that particular regression.

I have done it by looping/iterating through each column of dependent variable.

fit = list()
for( i in 1 : 2 ) {
    fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[  , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
    }

Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?

Random Data

mydf = data.frame( 
"ID"    = rep( "A" , 25 ),
"Date"  = c( 1 : 25 ), 
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,          
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,          
"Ind_Var" = c( 75:51 )  )

My own attempt of converting will be:

apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x)  ) )

but this involves having mydf hardcoded.

I apologize if I have used any incorrect terms.

I only used foreach to create the list, but I have edited it to for now for consistency. — mathnoob
– mathnoob, Commented Oct 19, 2017 at 4:16

Maurits Evers · Accepted Answer · 2017-10-19 04:33:45Z

1

What about the following approach

# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);

# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))

Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.

If you want to manually exclude the NAs you could use complete.cases inside the lapply function; this shouldn't be necessary because lm (by default) deals with NA's.

I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df, with predictors given in columns predIdx and the independent variable given in column respIndx.

one_at_a_time_LM <- function(df, predIdx, respIdx) {
    lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}

one_at_a_time_LM(mydf, c(3, 4), 5);

edited Oct 19, 2017 at 4:33

answered Oct 19, 2017 at 4:20

Maurits Evers

51k4 gold badges53 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mathnoob Over a year ago

In the code I used, I had lm( mydf$Ind_Var ....) . I wanted it to be more general like just x. I guess a better way to say it would have been to say manually specify, as you did. It is quite vague, but hopefully this made sense.

Maurits Evers Over a year ago

I see. In that case, creating a general function like one_at_a_time_LM would be the way to go...

Collectives™ on Stack Overflow

R - How can I use the apply functions instead of iterating?

Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

Random Data

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

Random Data

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related