5

I have a pyspark dataframe column where there are mix of values like some are string and some are numbers like below -

Source_ids
abc_123
1234.0
345
abc_cad
K-123
540.0
456.0

I want to remove the decimal part wherever it is coming. So result should be

Source_ids
abc_123
1234
345
abc_cad
K-123
540
456

We can not make this column long type since it has text also. How can I achieve it?

1 Answer 1

5

Using regexp_replace. Replace \..*$ with the empty string ''.

This expression finds the dot and replaces everything after that.

from pyspark.sql.functions import *
df.withColumn("Source_ids", regexp_replace('Source_ids', '\..*$', '')).show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.