I have a requirment to filter the pyspark dataframe where user will pass directly the filter column part as a string parameter. For example:
Sample Input data: df_input
|dim1|dim2| byvar|value1|value2|
| 101| 201|MTD0001| 1| 10|
| 201| 202|MTD0002| 2| 12|
| 301| 302|MTD0003| 3| 13|
| 401| 402|MTD0004| 5| 19|
Ex 1: filter_str = "dim2 = '201'"
I will filter the data as: df_input = df_input.filter(filter_str)
Output: (**I'm able to get the output**)
|dim1|dim2| byvar|value1|value2|
| 101| 201|MTD0001| 1| 10|
But, for multiple filter condition I'm getting error and not able to filter. Scenario where I'm not able to filter the input dataframe:
valid Scr 1:
filter_str = "dim1 = '101' and dim2 in '['302', '402']'"
df_inp = df_inp.filter(filter_str)
Getting Error
valid Scr 2:
value_list = ['302', '402']
filter_str = "dim1 = '101' or dim2 in '(value_list)'"
df_inp = df_inp.filter(filter_str)
Getting Error
Could you please help in acheiving the scr 1 and 2 and how to modify the filter section if i get the filter_str string as mentioned I example.