I have a PySpark dataframe like this:
| A | B |
|---|---|
| 1 | abc_value |
| 2 | abc_value |
| 3 | some_other_value |
| 4 | anything_else |
I have a mapping dictionary:
d = {
"abc":"X",
"some_other":Y,
"anything":Z
}
I need to create new column in my original Dataframe which should be like this:
| A | B | C |
|---|---|---|
| 1 | abc_value | X |
| 2 | abc_value | X |
| 3 | some_other_value | Y |
| 4 | anything_else | Z |
I tried mapping like this:
mapping_expr = f.create_map([f.lit(x) for x in chain(*d.items())]) and then applying it with withColumn however it is exact matching, however I need partial (regex) matching as you can see.
How to accomplish this, please?