I have a requirement to apply some logic on different rows of a dataframe and create a new dataframe with rows only satisfying the logic.
The input dataframe is as shown below.
+------------+-------------+-----+-----+-----+-----+
| NUM_ID | E |SG1_V|SG2_V|SG3_V|SG4_V|
+------------+-------------+-----+-----+-----+-----+
|XXXXX01 |1570167499000| | | 89.0| |
|XXXXX01 |1570167502000| |88.0 | | |
|XXXXX01 |1570167503000| |99.0 | | |
|XXXXX01 |1570179810000|81.0 |81.0 |81.0 |81.0 |
|XXXXX01 |1570179811000|92.0 | |95.0 | |
|XXXXX01 |1570179833000| | |88.0 | |
|XXXXX02 |1570179840000| |81.0 | |81.0 |
|XXXXX02 |1570179841000|81.0 | |81.0 |81.0 |
|XXXXX02 |1570179841000| | | | |
|XXXXX02 |1570179842000|81.0 | | | |
|XXXXX02 |1570179843000|87.0 |98.0 |97.0 |88.0 |
|XXXXX02 |1570179849000| | | | |
|XXXXX03 |1570179850000| | | | |
|XXXXX03 |1570179852000|88.0 | | | |
|XXXXX03 |1570179857000| | | |88.0 |
|XXXXX03 |1570179858000| | | |88.0 |
I have to check the values for each SG_V columns such a way that the difference between the each SG_V for a NUM_ID is greater than 10. The difference value of 10 for a single SG_V or multiple SG_V columns in a row will be considered as a single row.
It will be clear once you have a look at expected output. expected output is as below.
+------------+-------------+------------+-----+------------+-----+------------+-----+------------+-----+
| NUM_ID | E |PREVIOUS_SG1|SG1_V|PREVIOUS_SG2|SG2_V|PREVIOUS_SG3|SG3_V|PREVIOUS_SG4|SG4_V|
+------------+-------------+------------+-----+------------+-----+------------+-----+------------+-----+
|XXXXX01 |1570167503000| | | 88.0 |99.0 | | | | |
|XXXXX01 |1570179811000|81.0 |92.0 | | |81.0 |95.0 | | |
|XXXXX02 |1570179843000| | |81.0 |98.0 |81.0 |97.0 | | |
Thanks in Advance! Any leads appreciated.