3

I just started using R for statistical purposes and I appreciate any kind of help.

My task is to make calculations on one index and 20 stocks from the index. The data contains 22 columns (DATE, INDEX, S1 .... S20) and about 4000 rows (one row per day).

Firstly I imported the .csv file, called it "dataset" and calculated log returns this way and did it for all stocks "S1-S20" plus the INDEX.

n <- nrow(dataset)
S1 <- dataset$S1
S1_logret <- log(S1[2:n])-log(S1[1:(n-1)])

Secondly, I stored the data in a data.frame:

logret_data <- data.frame(INDEX_logret, S1_logret, S2_logret, S3_logret, S4_logret, S5_logret, S6_logret, S7_logret, S8_logret, S9_logret, S10_logret, S11_logret, S12_logret, S13_logret, S14_logret, S15_logret, S16_logret, S17_logret, S18_logret, S19_logret, S20_logret)

Then I ran the regression (S1 to S20) using the log returns:

S1_Reg1 <- lm(S1_logret~INDEX_logret)

I couldn't figure out how to write the code in a more efficient way and use some function for repetition.

In a further step I have to run a cross sectional regression for each day in a selected interval. It is impossible to do it manually and R should provide some quick solution. I am quite insecure about how to do this part. But I would also like to use kind of loop for the previous calculations.

Yet I lack the necessary R coding knowledge. Any kind of help top the point or advise for literature or tutorial is highly appreciated! Thank you!

1 Answer 1

1

You could provide all the separate dependent variables in a matrix to run your regressions. Something like this:

#example data
Y1 <- rnorm(100)
Y2 <- rnorm(100)
X  <- rnorm(100)
df <- data.frame(Y1, Y2, X)

#run all models at once
lm(as.matrix(df[c('Y1', 'Y2')]) ~ X)

Out:

Call:
lm(formula = as.matrix(df[c("Y1", "Y2")]) ~ df$X)

Coefficients:
             Y1        Y2      
(Intercept)  -0.15490  -0.08384
df$X         -0.15026  -0.02471
Sign up to request clarification or add additional context in comments.

4 Comments

Glad I could help :).
This works perfectly! I am just wondering how I could run a cross sectional regression in the same effective way over a certain period of time (one regression per day). Using the log returns of each stock (S1-S20) as the dependent variable on one day and the previously calculated coefficient as the explanatory variable. I don't need the intercept that works with Y ~ X + 0 to eliminate the intercept. Running this kind of regression appears to me much more complicated. If the regression is over 50 days, I should end up with 50 estimated coefficients (one for each day) and 50*20=1000 errors.
This is a new question that these should not be discussed in comments. Feel free to add a new question (there is no limit to the questions you can ask) and also make sure you add an example to clarify what you want to achieve (so that the people answering would not need to create their own examples like I did).
Thank you for your comment. I added a new question: stackoverflow.com/questions/46268146/… maybe you could also take a look at this, since your post on my previous question was really useful ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.