In PySpark, using regexp_replace, how to replace a set of characters in a column values with others?

Question

I have a list List1=["BD","BZ","UB","DB"]

I need to change the specific characters in a string as shown below using the regex_replace

pyspark df col values :

BD_AAAZ_D3002_BZ1_UB_DEV

Expected output:

BZ_AAAZ_D3002_BZ1_DB_DEV

how do you know that UB has to be replaced with DB and not with any other value of the list like - may be BZ. It should rather be dictionary and not a list — NNM
– NNM, Commented Jan 4, 2022 at 15:04
if you can convert the list to dict, you should be able to use this solution stackoverflow.com/questions/50231310/… — NNM
– NNM, Commented Jan 4, 2022 at 15:28
Please provide enough code so others can better understand or reproduce the problem. — Community
– Community Bot, Commented Jan 12, 2022 at 9:13

BoomBoxBoy · Accepted Answer · 2022-01-05 15:41:46Z

0

Given the following dataframe

df.show(truncate=False)
+------------------------+
|col_1                   |
+------------------------+
|BD_AAAZ_D3002_BZ1_UB_DEV|
+------------------------+

You can use spark.sql's functions to get the desired answer

from pyspark.sql import functions

df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(BD)(?![a-zA-Z]+)', 'BZ').alias("col_1"))
df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(UB)(?![a-zA-Z]+)', 'DB').alias("col_1"))

df.show(truncate=False)
+------------------------+
|col_1                   |
+------------------------+
|BZ_AAAZ_D3002_BZ1_DB_DEV|
+------------------------+

edited Jan 5, 2022 at 15:41

answered Jan 4, 2022 at 15:46

BoomBoxBoy

1,8951 gold badge9 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

vinay reddy Over a year ago

Hi BrendanA Thank you for your answer. "BD_AAAZ_D3002_BZ1_UB_DEV" is a single string, where i want to replace the BD with BZ and UB with DB. Could you please help me here.

BoomBoxBoy Over a year ago

Yep, just edited my answer to fit your use case

Collectives™ on Stack Overflow

In PySpark, using regexp_replace, how to replace a set of characters in a column values with others?

pyspark df col values :

Expected output:

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

pyspark df col values :

Expected output:

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related