I have two data tables as follows:
set.seed(1)
landing_fees <- data.table(`date`= c(20160101,20160101,20160101,
20160102,20160102,20160102,20160103,20160103,20160103),
airport=c("LAX","PHX","HNL"), period_a_fee=sample(100:200,9),
period_b_fee=sample(100:200,9), period_c_fee=sample(100:200,9))
routes <- data.table(origin=c(rep("LAX",4),rep("PHX",4)),
destination=c("PHX","HNL","PHX","HNL","LAX","HNL","LAX","HNL"),
period=rep(c("period_a","period_a","period_b","period_c"),2))
Which output as follows:
> landing_fees
date airport period_a_fee period_b_fee period_c_fee
1: 20160101 LAX 126 106 138
2: 20160101 PHX 137 120 177
3: 20160101 HNL 156 117 192
4: 20160102 LAX 189 167 120
5: 20160102 PHX 119 137 163
6: 20160102 HNL 186 173 112
7: 20160103 LAX 197 147 125
8: 20160103 PHX 162 197 136
9: 20160103 HNL 158 192 101
> routes
origin destination period
1: LAX PHX period_a
2: LAX HNL period_a
3: LAX PHX period_b
4: LAX HNL period_c
5: PHX LAX period_a
6: PHX HNL period_a
7: PHX LAX period_b
8: PHX HNL period_c
The real datasets are huge with millions of origin/destination combinations and years worth of dates. I would like to add a new column to the routes table for each unique date in landing_fees$date and programatically name the column with the date and _cost (ie 20160101_cost). Then in each new column, sum the fee for origin and the fee for the destination (using matching time period) as listed in the landing_fees table. For instance, in the routes table, a new column routes$20160101_cost would be created and the value of row 1 would be: LAX period_a fee of 126 (from landing_fees table) plus PHX period_a fee of 137 (from landing_fees table) equals 263. Same calculation would be made for each origin/destination/period combination for each date.
Final output would look like this:
origin destination period 20160101_cost 20160102_cost 20160103_cost
1: LAX PHX period_a 263 308 359
2: LAX HNL period_a 282 375 355
3: LAX PHX period_b 226 304 344
4: LAX HNL period_c 330 232 226
5: PHX LAX period_a 263 308 359
6: PHX HNL period_a 293 305 320
7: PHX LAX period_b 226 304 344
8: PHX HNL period_c 369 275 237
Since the datasets can be fairly large, I am hoping to use efficient data.table script. I am using data.table version 1.9.6 on Windows 7 machine.