1

I am undertaking stepwise linear regression to predict missing values. I can do this on a variable by variable basis, but I have a very large data frame with over 50 variables and need to find away to automated the process of getting the fitted values for multiple variables. I am aware that many statisticians do not like stepwise procedures, but I would still like to implement them.

Below is the code I am using to do this on a variable by variable basis:

test <- data.frame(predict.lm(object = step(lm(dep_var1 ~ ind_var1 + ind_var2 + ind_var3, data = df1),direction = "both"), newdata = df1))

colnames(test) <- "dep_var1"

Below is example data

df1 <- structure(list(dep_var1 = c(NA_real_, NA_real_, 
-2.09123267205066, 0.230793085482842, 2.37381389867166, -1.254476456844, 
0.803358768774937, -0.193694287225052, 1.4135048896131, -1.01027931169849, 
-0.353471151423884, -1.8471429353131, 0.846656684067891, -0.577619029380873, 
1.56174835187537, -0.180654842356546, 0.606702067578114, 0.63196118363776, 
-2.07546608269867, -1.6981663767802, -2.37523932992292, 0.76639616724562, 
2.79632224479538, -2.83455947605957, -1.33255484820427, 1.13620307003978, 
0.0748723253449958, -0.971846570370541, 0.833084653739389, 1.22652791855451, 
-1.41360170749287, 1.56830155870067, -1.12470646556145, -0.0187794024628569, 
-0.423859330845611, -0.712475730126666, -0.188195097884893, -0.925214646951187, 
2.34270511007552, -1.93278147868247, 0.327538505404795, 0.631163864457143, 
-2.85767723932405, 1.75496256076676, -1.42847227988351, 2.7512047410972, 
-1.15934991023766, -1.54975291965205, -0.11032054745398, 1.92751343170804, 
0.789613141824792, -0.917519738054573, -0.952544104866665, -3.24167052431999, 
-0.52210553650643, 0.18239691875455, -3.21027452658145, -0.827625012712401, 
-0.26672819041463, -1.94823563624677, 2.63505186730208, 0.0366011774775348, 
2.65569794154129, -2.12446625497985, -1.27360207957464, 0.448158096131414, 
-2.49661319932106, 1.02489387271096, -1.08099011979409, -0.364521583133239, 
1.84812022254912, -1.97231278697627, -0.548672808444616, -2.66885146325586, 
-2.23320660644535, -1.34044182986747, -0.988382288011769, -0.945936400194469, 
-0.374814294872094, 0.962918718857577, -2.26590978712601, -0.932063294009854, 
1.13878640351243, -0.472148199947895, 0.372002078593101, 1.00490709225994, 
-2.48452188170382, -0.250170527558021, 0.922254020376051, 3.13691655377035, 
0.0872528229244095, 1.48719103494955, -0.994742032242124, -1.73988494786043, 
0.424588121740004, -2.41510577689421, -1.5841259205017, 2.34360206782046, 
0.535053007004022, -0.795024729905373), dep_var2 = c(2.07303849961519, 
-1.02627125901242, 2.00209093064551, 2.33854031704522, -1.94170342751993, 
1.29711275552946, -1.1573914248646, 2.77266492930927, 1.52318282862803, 
2.50533399732185, 2.18247552424418, 1.57070140547483, -1.80780160813424, 
0.36791214355129, -2.49767760388436, 0.385602175407397, 0.11990775524449, 
-0.277242508402587, -1.45086031801734, 3.77402161660446, -1.24358503248032, 
-3.16519765000204, -0.58250906528939, 1.04464047101027, 0.173724227542418, 
-3.27068834263146, -1.12633556290261, 1.26357853218466, 0.314211534228324, 
-0.585398043962647, -0.897440667747893, -0.483528806014744, -0.583023502992864, 
-1.96040591216907, 0.996014489963131, 1.71087323572918, 0.623006241001743, 
2.11174786637826, 0.420870966700236, -0.318425846406272, -0.902348953954844, 
-1.56791408364248, 2.24200780236017, 1.04557599992065, 1.37600483352856, 
-2.86817745599522, -1.0387333666576, 1.07953682410029, 0.191775638252006, 
-1.48865614959846, -1.76195773849034, -0.298594272403301, 0.235042377873754, 
0.0403724174579101, -1.2327030772748, -0.509896189671339, 1.79187808213233, 
0.508896870272482, 1.87215238243187, 5.42089769981591, 1.05336781075391, 
1.96701365084408, -2.26904993911809, -1.32806705070234, 0.284169651292081, 
3.02750536394422, 1.55475894954328, -1.39469699223261, -0.647098215723534, 
-1.86470919954381, 0.132124712418362, 0.794947727046341, 0.765112914503222, 
1.0562579736073, 0.379018770290438, -0.911880644497877, 1.3675121350016, 
-0.899376872411081, -2.36095033247759, 1.59497346648275, -0.541751418443624, 
-1.34500493840032, 2.12015805342449, 2.77354184178997, -3.96370880146096, 
-0.0967628116821005, 1.97876659343358, -1.77845530622916, 1.16590928446694, 
-0.106112277520016, 1.19636132483196, 1.60566951317693, 2.09590452462496, 
0.214460090479266, -1.87019786463146, 1.64600594683429, 0.213332757178706, 
-2.17935397786443, 2.21635976782075, -0.392555892448031), dep_var3 = c(0.616700731082951, 
4.16279558260156, 1.10940530392079, -2.8569223582772, 0.402520816282224, 
-1.04411931764913, -0.609172559785609, -3.20807626475815, -2.08381934294098, 
-1.57712938280433, -1.44209052953985, -0.352794093438308, -0.608327907097134, 
-2.25597485701099, 2.19386899842515, 0.396416957807837, 1.33246847256144, 
-0.0762686733985066, 0.464588471846464, 3.94769110440112, 1.68318663058877, 
1.10935304551582, -2.71677518211804, 1.59362361780755, -1.62129130253971, 
-0.127118607974366, -0.417026737550066, -0.241262097212425, -1.52296844320382, 
-2.56829334841815, 0.799132956325209, 0.220522383259441, 2.37490948964111, 
4.15215150868392, -0.812992593809876, -0.173256232772018, 1.71074725747611, 
-1.0216605970604, -2.02721169453559, -4.09137683106018, 0.0474862298692908, 
3.31122428784435, -0.109026136376674, -3.46365644884461, -1.35460817015094, 
-0.899169317402685, 2.79440901022252, -0.794037627815716, 2.59917986374591, 
-2.14467166749864, 1.70019936889493, 0.721183948988304, -0.102388950793829, 
0.417677247084431, -1.01294623403926, 0.530290499693695, -0.678407609540795, 
1.36678775280302, 0.0970122249348387, 0.984762058542595, -3.21893736068827, 
-0.176771833178864, 1.46524980459238, 5.09545403085887, 1.46390691826153, 
-2.28175042941279, 1.17844832995436, -0.51656608642314, 0.915840406252925, 
1.8162815506279, -0.838763232984826, -1.78425071852195, -2.02035769534564, 
1.94260379368071, 4.03367533975736, -0.89328282008572, -2.73980411204667, 
-0.664566579870786, 1.2743809088601, 1.217725543838, 0.33860561843341, 
-1.7583845390752, -3.82437030519712, -4.1251791941278, 2.16768888784062, 
0.0208230680948219, -1.47964005154307, 0.0435783517650753, -3.94727089909519, 
-0.818173043130464, -3.4742303828308, -0.941225010967932, -0.979536393425847, 
-0.818834044969523, 0.795467907282362, -0.929285918331344, 0.668127671169617, 
-0.254668928895892, -2.13424401943605, -2.29388988629311), ind_var1 = c(0.458454397686833, 
-0.128440463741865, 0.363604764506242, -0.0693474758868018, 1.72259605847845, 
1.69526675465286, -1.623924222505, 0.15126566544286, -1.93552451013567, 
-2.58683178733901, -0.233912306362039, -2.47192439188638, 0.620795754754641, 
-0.992480709929954, 0.482192425484265, -2.61563698833568, 0.0128550866026035, 
0.392025740980614, -0.0473362942736612, -2.64909215232388, -1.47622293773269, 
3.16190990221028, 3.49243154151446, -0.272928040177153, -0.761411336416013, 
2.64997041637778, 0.577458182483536, -2.42929594600083, -0.267243349065099, 
0.722347497120074, 1.74884020954902, -0.0348288966586645, -1.52719161170932, 
-0.933148290337328, -0.490447995741133, 0.655322312303463, -2.52750457266348, 
0.668092340207411, 0.585782768355766, -0.359703526704027, 1.65001495114651, 
0.660363284824336, 0.0862383898649589, -0.365574191100425, -2.16177422896681, 
3.89053917972807, -0.142261253218103, 0.707021521565601, 0.0227116811915725, 
-0.454014719282556, 3.08453484473708, -1.06212270847072, -0.399418638058533, 
-0.262910611084249, 1.93593096630764, -0.725649177240837, -1.17309612984748, 
-0.373437242782234, -0.680948834115372, -4.13059660441355, -0.0409060052137248, 
0.989037314169956, 1.2259749106443, -0.66115377935577, -1.51318623204637, 
0.708828930872304, 2.34078004259392, 2.55044212723072, 0.141264088851028, 
2.17300161541665, 0.788684015013957, -2.80016454552875, 0.907606363872277, 
-2.53767303689764, 0.430023970340317, 0.972560430691479, -0.57115769920932, 
0.675371714699047, -0.819273676763145, -0.779254118891752, 1.13734662396304, 
-0.189212077733243, 1.62723080758521, -0.979259176936454, 1.14316624823637, 
2.91560630534064, 0.544678587889513, 0.104127307592218, 0.548266027482326, 
2.09782272529516, -0.405642732646619, -0.767523596762102, -0.101666159527356, 
0.478216111399646, 1.99281202677566, -2.226625310068, -0.971517903790143, 
0.460258073138533, -2.89835631489168, -1.02171119729811), ind_var2 = c(-0.056357182811544, 
1.74174805302751, 0.726184590489127, -0.776468741542423, -0.382713389335797, 
-2.04718702133114, 0.831366181579827, -0.213090131848065, 0.840865733882644, 
1.22835392560235, 0.157950531820239, 2.06119246289913, -0.956157941014712, 
-1.08971104497602, 0.326241704298168, 1.92200778034698, 0.688832722217709, 
-0.627922012586111, -1.19199346650355, 4.22350716099696, 0.641422750933785, 
-2.51080407306521, -2.48755232089754, 0.786465747299846, -1.75767028255026, 
-3.1809952588847, -1.16180005417099, 1.62222731815135, -0.36774662856744, 
-1.08013180924562, -0.792625832269249, 0.0354459155484843, 0.739265747174507, 
1.46933161619649, 0.665910133217599, 0.187823805723774, 2.56835385685832, 
-0.690151675677563, 0.698293566284355, -2.16814193217446, -1.49261328970516, 
0.676123306999542, -0.3939491038487, 0.448077244911608, 0.875734079074383, 
-2.86089580463621, 0.604268757076813, -1.64354489300732, 2.45923451123531, 
-1.68604842945783, -1.9184819589674, 0.139599937397156, 0.828244213896308, 
-1.75129154686091, -2.63929211963569, -0.543288071994073, -0.438679067953734, 
0.192090404456049, 0.758062917239584, 5.25351678020715, -0.277581138478905, 
0.119360139881858, 0.428014862847672, 2.2085245244809, 1.6315453284043, 
0.406134966449986, -1.95269069535625, -1.44363400477165, -0.773787305174728, 
-1.87725581196967, -0.173579458092002, 0.828185227827978, -0.753314550989367, 
2.55617987716488, 1.6298004240679, -2.21082666011452, -1.2473960162524, 
-2.36940584906052, 0.531174618968768, 2.62463381810192, -0.273642107149701, 
-0.932988862867355, -1.07788635500683, -0.674291949186377, -0.86325278256275, 
-2.40754111826735, -1.27808264400922, 0.177596193414942, -1.76242219594059, 
-1.03192825321543, -0.870426991870862, 0.907721012331873, -0.439384772692009, 
-1.73676155170012, -1.14685643668553, 0.355921250966228, 0.369132512048539, 
-1.03839194256396, 1.67059937513388, -1.32434182747233), ind_var3 = c(-1.1389104968572, 
-1.65852944320507, -1.45705577426981, 1.07794506870353, 0.719224058000476, 
-0.158461497822828, 0.705353993877171, 0.337767898018486, 0.117250430739658, 
-0.943398774117966, 0.0329809151250609, -0.568980218136715, 0.928266346136966, 
1.05631907220357, -0.0736055811494815, 0.196830300827318, -0.13576295582571, 
0.257537068142104, -0.137358419008261, -3.0554298580581, -0.533447743252316, 
1.12258694757551, 1.01687632724484, -1.79571198682012, 0.0148816879851791, 
0.82485066910626, 1.00423601009619, -1.07647074570615, 0.470091204928795, 
2.03233021484527, 0.0386841839290024, 0.593792838064128, -1.04728378442583, 
0.00874708446552375, -0.980903401411594, -1.00464434293468, -0.422762600910394, 
-0.42186665574121, 0.785678338823868, 0.452762774537635, 0.146780016995895, 
0.188940756286868, -0.510331441771421, 0.857829724013878, -1.14239581375406, 
1.70863954753159, -0.45918654843729, 0.0576603952242708, -1.27129923558338, 
2.02258278000593, 0.40380866400308, -0.654966856348495, 0.174065512343151, 
0.0275895676352105, 0.918865223950716, -0.584475829976857, -1.19524511596668, 
-0.487679955982114, -0.369099439891801, -2.99052050986791, 1.48199456815231, 
-0.982177118355558, 1.1861353538926, -1.08400989832084, -0.611798044606918, 
0.195029407984118, -0.933873607869469, 0.932982555282905, 0.749446947724109, 
0.309289116358974, 0.490082369957284, -0.479016122713183, 0.224163061951812, 
-1.55318448145768, -1.60841407694929, 0.0313841417028764, 0.529735266681235, 
0.487000304158991, 0.182326460494007, -1.00576805100532, -0.718578942204117, 
0.384314741454849, 0.633681783832062, 0.683973793799741, 0.200446142331914, 
0.376184166146214, -0.459051327415705, 0.352483771659012, 1.13367389882802, 
1.61456716867767, 0.113332066436203, 0.828244743171307, -0.302128248121384, 
-0.0394767029347994, 0.624579306812765, -0.613476676670482, -0.735579500581425, 
0.833063484439717, -0.353751888509078, 0.351207888901893)), class = "data.frame", row.names = c(NA, 
-100L))
3
  • Alternatively you can check out the MICE package which attempts to do this but it's a more black box approach. Also the mi package Commented Sep 2, 2020 at 11:42
  • Many thanks Cheung! I have tried MICE, it does a great job at retaining the distribution parameters, but is not so good at giving and accurate prediction on a per sample basis (which is what is needed in this instance. I will check out mi Commented Sep 3, 2020 at 9:47
  • 1
    You are indeed correct, your approach does allow more inference of how you get to the result too, BTW, I found your question and the answer supplied, a very useful alternative. Many Thanks Commented Sep 3, 2020 at 13:35

1 Answer 1

2

Are you trying to do something like this?

dep_cols <- grep('dep', names(df1), value = TRUE)
ind_cols <- grep('ind', names(df1), value = TRUE)

models <- lapply(dep_cols, function(x) step(lm(reformulate(ind_cols, x), 
                                    data = df1), direction = "both"))
new_data <- lapply(models, function(x) data.frame(value = 
                            predict.lm(object = x, newdata = df1)))

You can also combine the two lapply call into one but I have kept them as separate for clarity.

Sign up to request clarification or add additional context in comments.

3 Comments

That works, the only issue is that each output is as a separate list. Is there anyway each column can be output to a column so I can use it in a data frame?
You mean new_data <- do.call(cbind, new_data) ?
Perfect! Thanks a lot Ronak

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.