1

Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.

When there is a missing observation (NA) , the entire row is not used for that particular regression.

I have done it by looping/iterating through each column of dependent variable.

fit = list()
for( i in 1 : 2 ) {
    fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[  , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
    }

Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?

Random Data

mydf = data.frame( 
"ID"    = rep( "A" , 25 ),
"Date"  = c( 1 : 25 ), 
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,          
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,          
"Ind_Var" = c( 75:51 )  )

My own attempt of converting will be:

apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x)  ) )

but this involves having mydf hardcoded.

I apologize if I have used any incorrect terms.

2
  • foreach doesn't look like a base function. Commented Oct 19, 2017 at 4:05
  • I only used foreach to create the list, but I have edited it to for now for consistency. Commented Oct 19, 2017 at 4:16

1 Answer 1

1

What about the following approach

# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);

# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))

Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.

If you want to manually exclude the NAs you could use complete.cases inside the lapply function; this shouldn't be necessary because lm (by default) deals with NA's.


I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df, with predictors given in columns predIdx and the independent variable given in column respIndx.

one_at_a_time_LM <- function(df, predIdx, respIdx) {
    lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}

one_at_a_time_LM(mydf, c(3, 4), 5);
Sign up to request clarification or add additional context in comments.

2 Comments

In the code I used, I had lm( mydf$Ind_Var ....) . I wanted it to be more general like just x. I guess a better way to say it would have been to say manually specify, as you did. It is quite vague, but hopefully this made sense.
I see. In that case, creating a general function like one_at_a_time_LM would be the way to go...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.