3

There is a column batch in dataframe. It has values like '9%','$5', etc.

I need use regex_replace in a way that it removes the special characters from the above example and keep just the numeric part.

Examples like 9 and 5 replacing 9% and $5 respectively in the same column.

3 Answers 3

7
df.withColumn("batch",regexp_replace(col("batch"), "/[^0-9]+/", ""))
Sign up to request clarification or add additional context in comments.

8 Comments

creating a column batch with only numbers as value... Every thing apart from numbers will be replaced with blank
Sorry for the delay in response. I tried your code and it worked fine. Thanks a lott
what if we want to do it for all columns in single command, please
@dileepvarma regex_replace can be used in select... df.select(regexp_replace(col("col1"), "/[^0-9]+/", ""),regexp_replace(col("col2"), "/[^0-9]+/", ""))
val cols = df.columns.map(x=>regexp_replace(col(x), "/[^0-9]+/", ""))... df.select(cols:_*)
|
2

You can use this regex:

\W+

\W - matches any non-word character (equal to [^a-zA-Z0-9_])

Comments

1

What have you tried so far?

select regexp_replace("'$5','9%'","[^0-9A-Za-z]","")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.