I have a list List1=["BD","BZ","UB","DB"]
I need to change the specific characters in a string as shown below using the regex_replace
pyspark df col values :
BD_AAAZ_D3002_BZ1_UB_DEV
Expected output:
BZ_AAAZ_D3002_BZ1_DB_DEV
I have a list List1=["BD","BZ","UB","DB"]
I need to change the specific characters in a string as shown below using the regex_replace
BD_AAAZ_D3002_BZ1_UB_DEV
BZ_AAAZ_D3002_BZ1_DB_DEV
Given the following dataframe
df.show(truncate=False)
+------------------------+
|col_1 |
+------------------------+
|BD_AAAZ_D3002_BZ1_UB_DEV|
+------------------------+
You can use spark.sql's functions to get the desired answer
from pyspark.sql import functions
df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(BD)(?![a-zA-Z]+)', 'BZ').alias("col_1"))
df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(UB)(?![a-zA-Z]+)', 'DB').alias("col_1"))
df.show(truncate=False)
+------------------------+
|col_1 |
+------------------------+
|BZ_AAAZ_D3002_BZ1_DB_DEV|
+------------------------+