-1

I have a list List1=["BD","BZ","UB","DB"]

I need to change the specific characters in a string as shown below using the regex_replace

pyspark df col values :

BD_AAAZ_D3002_BZ1_UB_DEV

Expected output:

BZ_AAAZ_D3002_BZ1_DB_DEV

3
  • how do you know that UB has to be replaced with DB and not with any other value of the list like - may be BZ. It should rather be dictionary and not a list Commented Jan 4, 2022 at 15:04
  • if you can convert the list to dict, you should be able to use this solution stackoverflow.com/questions/50231310/… Commented Jan 4, 2022 at 15:28
  • Please provide enough code so others can better understand or reproduce the problem. Commented Jan 12, 2022 at 9:13

1 Answer 1

0

Given the following dataframe

df.show(truncate=False)
+------------------------+
|col_1                   |
+------------------------+
|BD_AAAZ_D3002_BZ1_UB_DEV|
+------------------------+

You can use spark.sql's functions to get the desired answer

from pyspark.sql import functions

df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(BD)(?![a-zA-Z]+)', 'BZ').alias("col_1"))
df = df.select(functions.regexp_replace('col_1', '(?<![a-zA-Z])(UB)(?![a-zA-Z]+)', 'DB').alias("col_1"))

df.show(truncate=False)
+------------------------+
|col_1                   |
+------------------------+
|BZ_AAAZ_D3002_BZ1_DB_DEV|
+------------------------+
Sign up to request clarification or add additional context in comments.

2 Comments

Hi BrendanA Thank you for your answer. "BD_AAAZ_D3002_BZ1_UB_DEV" is a single string, where i want to replace the BD with BZ and UB with DB. Could you please help me here.
Yep, just edited my answer to fit your use case

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.