I have a data.table with multiple columns of a variable "Performance" in specific years and a column named "ExPerf". I want to create a new column called FLAG which would indicate rows flagged for manual review based on these two conditions:
- Any of the "Performance" columns has a negative value
- The "ExPerf" column is different from any of the columns by more than 50%.
A mock data.table similar to the one I have:
library(data.table)
dt <- data.table(Id = c("N23", "N34", "N11", "N65", "N55", "N78", "N88"),
Name = c("ABCD", "ACBD", "ACCD", "ADBN", "ADDD", "DBCA", "CBDA"),
Type = c("T", "B", "B", "T", "T", "B", "B"),
Sold = c(500, 300, 350, 500, 350, 400, 450),
Bl = c(2000, 2100, 2000, 1500, 1890, 1900, 2000),
P_2016 = c(-200, 420, 800, 900, -10, 75, 400),
P_2017 = c(500, 300, -20, 700, 50, 80, 370),
P_2018 = c(1000, 400, 600, 800, 40, 500, 300),
EP_2019 = c(1500, 380, 500, 850, 30, 400, 350))
dt
Id Name Type Sold Baseline Perf_2016 Perf_2017 Perf_2018 ExpPerf_2019
N23 ABCD T 500 2000 -200 500 1000 1500
N34 ACBD B 300 2100 420 300 400 380
N11 ACCD B 350 2000 800 -20 600 500
N65 ADBN T 500 1500 900 700 800 850
N55 ADDD T 350 1890 -10 50 40 30
N78 DBCA B 400 1900 75 80 500 400
N88 CBDA B 450 2000 400 370 300 350
For this data.table the desired output would add the FLAG column as seen below:
Id Name Type Sold Baseline Perf_2016 Perf_2017 Perf_2018 ExpPerf_2019 FLAG
1: N23 ABCD T 500 2000 -200 500 1000 1500 TRUE
2: N34 ACBD B 300 2100 420 300 400 380 FALSE
3: N11 ACCD B 350 2000 800 -20 600 500 TRUE
4: N65 ADBN T 500 1500 900 700 800 850 FALSE
5: N55 ADDD T 350 1890 -10 50 40 30 TRUE
6: N78 DBCA B 400 1900 75 80 500 400 TRUE
7: N88 CBDA B 450 2000 400 370 300 350 FALSE