I have a dataframe df, which contains below data:
**customers** **product** **Val_id**
1 A 1
2 B X
3 C
4 D Z
i have been provided 2 rules, which are as below:
**rule_id** **rule_name** **product value** **priority**
123 ABC A,B 1
456 DEF A,B,D 2
Requirement is to apply these rules on dataframe df in priority order, customers who have passed rule 1, should not be considered for rule 2 and in final dataframe add two more columns rule_id and rule_name, i have written below code to achieve it:
val rule_name = when(col("product").isin("A","B"), "ABC").otherwise(when(col("product").isin("A","B","D"), "DEF").otherwise(""))
val rule_id = when(col("product").isin("A","B"), "123").otherwise(when(col("product").isin("A","B","D"), "456").otherwise(""))
val df1 = df_customers.withColumn("rule_name" , rule_name).withColumn("rule_id" , rule_id)
df1.show()
Final output looks like below:
**customers** **product** **Val_id** **rule_name** **rule_id**
1 A 1 ABC 123
2 B X ABC 123
3 C
4 D Z DEF 456
Is there any better way to achieve it, adding both columns by just going though entire dataset once instead of going through entire dataset twice?