1

I want to use regexp_replace() in PySpark to convert all question marks and back slashes in my data frame to null values. This is the code I used:

question = "?"
empty_str = "\\\"\\\""

for column in df.columns:
     df = df.withColumn(column, regexp_replace(column, question, None)
     df = df.withColumn(column, regexp_replace(column, empty_str, None)

However, when I use this code all the values in my dataframe turn into null values - not just the question marks and back slashes. Is there a way I can change my code to fix this?

0

2 Answers 2

1

With regexp_replace you cannot replace values to null, you will need another method, e.g. replace

from pyspark.sql import functions as F
df = spark.createDataFrame([("?",), ("\\",), ("b",)], ["col_name"])
df.show()
# +--------+
# |col_name|
# +--------+
# |       ?|
# |       \|
# |       b|
# +--------+

pattern = r"^[?\\]+$"
df = df.withColumn("col_name", F.regexp_replace("col_name", pattern, "")) \
       .replace("", None, "col_name")
df.show()
# +--------+
# |col_name|
# +--------+
# |    null|
# |    null|
# |       b|
# +--------+

In your attempt, every value changed to null, because you incorrectly provided None to the replacement argument, instead of str. Only str is accepted, according to the documentation.

pyspark.sql.functions.regexp_replace(str: ColumnOrName, pattern: str, replacement: str) → pyspark.sql.column.Column

Sign up to request clarification or add additional context in comments.

Comments

0

it is working like this, ou have to use \\ to replace backslash with null and ? to replace Question mark with null

>>> df.show(truncate=False)
+-------------------------------------------------+
|_c0                                              |
+-------------------------------------------------+
|"{""id"\":""e5?2f247c-f46c-4021-bc62-e28e56db1ad8|
+-------------------------------------------------+

>>> df.withColumn("_c0",regexp_replace('_c0','\\\\','')).withColumn("_c0",regexp_replace('_c0','\?','')).show(truncate=False)
+-----------------------------------------------+
|_c0                                            |
+-----------------------------------------------+
|"{""id"":""e52f247c-f46c-4021-bc62-e28e56db1ad8|
+-----------------------------------------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.