Remove decimal value from pyspark column

Question

I have a pyspark dataframe column where there are mix of values like some are string and some are numbers like below -

Source_ids
abc_123
1234.0
345
abc_cad
K-123
540.0
456.0

I want to remove the decimal part wherever it is coming. So result should be

Source_ids
abc_123
1234
345
abc_cad
K-123
540
456

We can not make this column long type since it has text also. How can I achieve it?

Surya · Accepted Answer · 2020-10-08 02:40:17Z

5

Using regexp_replace. Replace \..*$ with the empty string ''.

This expression finds the dot and replaces everything after that.

from pyspark.sql.functions import *
df.withColumn("Source_ids", regexp_replace('Source_ids', '\..*$', '')).show()

answered Oct 8, 2020 at 2:40

Surya

3,4293 gold badges22 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1