My input dataframe is;
Date Client Until_non_null_value
2020-10-26 1 NULL
2020-10-27 1 NULL
2020-10-28 1 3
2020-10-29 1 6
2020-10-30 1 NULL
2020-10-31 1 NULL
2020-11-01 1 NULL
2020-11-02 1 NULL
2020-11-03 1 NULL
2020-11-04 1 NULL
2020-11-05 1 NULL
2020-11-06 1 NULL
2020-11-07 1 NULL
2020-11-08 1 NULL
2020-11-09 1 35
2020-10-26 2 NULL
2020-10-27 2 NULL
2020-10-28 2 NULL
2020-10-29 2 28
2020-10-30 2 NULL
2020-10-31 2 NULL
2020-11-01 2 NULL
2020-11-02 2 NULL
2020-11-03 2 NULL
2020-11-04 2 NULL
2020-11-05 2 1
2020-11-06 2 NULL
2020-11-07 2 NULL
2020-11-08 2 NULL
2020-11-09 2 NULL
I want to calculate between null counts between two non null values for each client as a new column in pyspark. I tried to rangeBetween etc. but i couldn' t handle it. I shared the requested output example below;
Date Client Score Until_non_null_value
2020-10-26 1 NULL 2 -> First null score value. 2 days away from first non null value (3).
2020-10-27 1 NULL NULL -> Not the first null value for score column. So it is null for result column .
2020-10-28 1 3 NULL
2020-10-29 1 6 NULL
2020-10-30 1 NULL 10 -> First null value after non null value (6). 10 days away from first non null value (25).
2020-10-31 1 NULL NULL
2020-11-01 1 NULL NULL
2020-11-02 1 NULL NULL
2020-11-03 1 NULL NULL
2020-11-04 1 NULL NULL
2020-11-05 1 NULL NULL
2020-11-06 1 NULL NULL
2020-11-07 1 NULL NULL
2020-11-08 1 NULL NULL
2020-11-09 1 25 NULL
2020-10-26 2 NULL 3
2020-10-27 2 NULL NULL
2020-10-28 2 NULL NULL
2020-10-29 2 28 NULL
2020-10-30 2 NULL 6
2020-10-31 2 NULL NULL
2020-11-01 2 NULL NULL
2020-11-02 2 NULL NULL
2020-11-03 2 NULL NULL
2020-11-04 2 NULL NULL
2020-11-05 2 1 NULL
2020-11-06 2 NULL NULL
2020-11-07 2 NULL NULL
2020-11-08 2 NULL NULL
2020-11-09 2 NULL NULL
Could you please help me about this?