1

I'm using the first principal component from a PCA analysis as an explanatory variable in a forecasting model that forecasts recursively using Kalman filtering. In other words, at each point in time, the model updates and produces a new forecast based on the new observation included into the model. Since PCA uses data from all observations included in the model for its calculations, I need to run also the PCAs recursively, using only the observations prior to the point in time that I am forecasting (otherwise, the PCA-result could reveal information about the future, and help the model produce a more accurate answer than it would have otherwise). I think a loop might be the solution, but I am struggling with how to formulate the code.

As a more specific example, consider if I have the following data.frame

data <- as.data.frame(rbind(c(6,15,23),c(9,11,22), c(7,13,23), c(6,12,25),c(7,13,23)))
names(data) <- c("V1","V2","V3")

> data
  V1 V2 V3
1  6 15 23
2  9 11 22
3  7 13 23
4  6 12 25
5  7 13 23

At each observation date, I wish to run a PCA (function prcomp() from the stats-package) for all observations up to, and including, that observation. So I want to first run PCA for the two first observation

pca2 <- prcomp(data[1:2,], scale = TRUE)

next I want to run PCA with the first, second and third observation as input

pca3 <- prcomp(data[1:3,], scale = TRUE)

next I want to run PCA with the first, second, third and fourth observation as input

pca4 <- prcomp(data[1:4,], scale = TRUE)

and so on, until the last run of the PCA, which includes all observations in the dataframe. For each of these "runs" of the PCA, I wish to extract the last value (though for pca2, I use both the first and second value) of the first principal component (PC1), and merge these into a final dataframe, where each monthly observation is the last value of the first principal component of PCA results for each of the runs.

The principal component outputs are:

> my_pca2 <- as.data.frame(pca2$x)
> my_pca2
        PC1           PC2
1 -1.224745 -5.551115e-17
2  1.224745  5.551115e-17

> my_pca3 <- as.data.frame(pca3$x)
> my_pca3
         PC1        PC2          PC3
1 -1.4172321 -0.2944338 6.106227e-16
2  1.8732448 -0.1215046 3.330669e-16
3 -0.4560127  0.4159384 4.163336e-16

> my_pca4 <- as.data.frame(pca4$x)
> my_pca4
          PC1         PC2          PC3
1 -1.03030993 -1.10154914  0.015457199
2  2.00769890  0.07649216  0.011670433
3  0.03301806 -0.24226508 -0.033461874
4 -1.01040702  1.26732205  0.006334242

So I want my final output to be a dataframe to look like

>final.output
         PC1
1  -1.224745
2   1.224745
3 -0.4560127
4 -1.01040702

Comment: yes, it looks a bit weird with the two first values, but please don't pay too much attention to that. My point is that I wish to build a dataframe that consists of the last calculated value for the first principal component for each of the PCA runs.

I am thinking that a for.loop might be the best solution here, but I have not been successful in finding any threads that might guide me closer to a coding solution. How can I make the loop use an increasing amount of the dataframe in the calculations? Does anyone have any suggestions/tips/links? Any help on this is much appreciated!

2 Answers 2

2

I had a very similar approach.

PCA <- vector("list", length=nrow(data)-1)
for(i in 1:(nrow(data)-1)) {
  if(i==1) j <- 1:2 else j<-i+1
  PCA[[i]] <- as.data.frame(prcomp(data[1:(1+i),], scale = TRUE)$x)[j, 1]
}

unlist(PCA)
Sign up to request clarification or add additional context in comments.

1 Comment

Short and efficient coding that solves the issue, spot on. Accepted this as answer to my question as it is so tidy and easy to incorporate into my script. Thanks!
0

You can use a for loop. It's maybe not the most efficient solution, but it will work.

First, you create an empty list to store your results:

all_results <- list()

Next, you iterate from 2 to the number of rows of data with a loop. For each iteration of the loop, run prcomp on data[1:i,]. You can directly create your pca data frame and extract PC1from it as a vector. Now you store it in the list at index i - 1

for(i in 2:nrow(data))
{
  all_results[[i - 1]] <- as.data.frame(prcomp(data[1:i,], scale = TRUE)$x)$PC1
}

Now to extract all the results, you use lapply (list apply) to extract only the last element from each PC1 vector:

PC1 <- lapply(all_results, function(pca) pca[length(pca)] )

Now you convert these from a list of single elements to a vector:

PC1 <- do.call("c", PC1)

Finally, you want to stick the first value of the first analysis back on to the front of this vector:

PC1 <- c(all_results[[1]][1], PC1)

1 Comment

This solution also solves the issue, but I chose to go with the solution presented by Edward as I have several PCAs as explanatory variables, and his code is slightly quicker to modify into my existing script. But I like how you explain your coding step by step, really valuable as I am not so familiar with coding loops. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.