1

This is the code that I am trying to run and it's taking a while.

Districts is a data frame of 39299 rows and 16 columns and lm_data is a data frame of 59804 rows and 16 variables. I want to set up a new variable in lm_data called tentativeStartDate which takes on the value of districts$firstDay[j] if a couple of conditions are meant. Is there a more efficient way to do this?

for (i in 1: nrow(lm_data)){
  for (j in 1: nrow(districts)){
    if (lm_data$DISTORGID[i] == districts$DISTORGID[j] & lm_data$gradeCode[i] == districts$gradeCode[j]){
      lm_data$tentativeStartDate[i] = districts$firstDay[j]
    }
  }
}
1

1 Answer 1

1

Not sure if this will work since I can't test it, but if it does work it should be much faster.

# get the indices
idx <- which(lm_data$DISTORGID == districts$DISTORGID & lm_data$gradeCode == districts$gradeCode)

lm_data$tentativeStartDate[idx] <- districts$firstDay[idx]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.