I have a specific requirement to fill all Values (categories) against a column. For example, as shown in the below table. I want a way to fill the 'UNSEEN' and 'ASSIGNED' category for code HL_14108.
val df = Seq(
("HL_13203","DELIVERED",3226),
("HL_13203","UNSEEN",249),
("HL_13203","UNDELIVERED",210),
("HL_13203","ASSIGNED",2),
("HL_14108","DELIVERED",3083),
("HL_14108","UNDELIVERED",164),
("HL_14108","PICKED",1)).toDF("code","status","count")
Input:
+--------+-----------+-----+
| code| status|count|
+--------+-----------+-----+
|HL_13203| DELIVERED| 3226|
|HL_13203| UNSEEN| 249|
|HL_13203|UNDELIVERED| 210|
|HL_13203| ASSIGNED| 2|
|HL_14108| DELIVERED| 3083|
|HL_14108|UNDELIVERED| 164|
|HL_14108| PICKED| 1|
+--------+-----------+-----+
Expected output:
+--------+-----------+-----+
| code| status|count|
+--------+-----------+-----+
|HL_13203| DELIVERED| 3226|
|HL_13203| UNSEEN| 249|
|HL_13203|UNDELIVERED| 210|
|HL_13203| ASSIGNED| 2|
|HL_13203| PICKED| 0|
|HL_14108| DELIVERED| 3083|
|HL_14108|UNDELIVERED| 164|
|HL_14108| PICKED| 1|
|HL_14108| UNSEEN| 0|
|HL_14108| ASSIGNED| 0|
+--------+-----------+-----+
I want to add the missing category rows for each code. What would be correct approach to do that in Apache spark?