I have a dataset of about 3 million rows. I created a small example shown below:
ex <- data.table(eoc = c(1,1,1,1,1,2,2,2,3,3), proc1 = c(63035,63020,92344,63035,27567,63020,1234,55678,61112,1236), trigger_cpt = c(63020,63020,63020,63020,63020,63020,63020,63020,61112,61112))
I have another dataset of 42 rows, but have produced a smaller example:
add_on <- data.table(primary = c(63020,61112), secondary=c(63035,63445))
I need to relabel certain rows on the "trigger_cpt" column (grouped by eoc) if the trigger_cpt value happens to be one of the values in the primary column of the dataset and if there is a proc1 value that is the secondary value in the add_on dataset. If it meets the criteria, then trigger_cpt should be relabeled to the secondary code.
I was first typing everything manually,
ex[,trigger_new := if(any(trigger_cpt == '63020' & proc1 == '63035')) 63035 else trigger_cpt, eoc]
and then decided to do a for loop
for(i in 1:nrow(add_on)){
ex[,trigger_new2 := if(any(trigger_cpt == add_on[i,1] & proc1 == add_on[i,2])) add_on[i,2] else trigger_cpt, eoc]
}
however, now that I'm trying this code on my 3 million row dataset, it's been taking a long time to run. I'm not sure if there's a better approach or if there's any modifications I could make to my current code?
any help would be greatly appreciated!
expected output:
ex_final <- data.table(eoc = c(1,1,1,1,1,2,2,2,3,3), proc1 = c(63035,63020,92344,63035,27567,63020,1234,55678,61112,1236), trigger_cpt = c(63035,63035,63035,63035,63035,63020,63020,63020,61112,61112))
add_on. then you may need to create a column 'eoc' in 'add_on' with those values and then do the join by 'eoc' along with other columns