How to split column in Spark Dataframe to multiple columns

Question

In my case how to split a column contain StringType with a format '1-1235.0 2-1248.0 3-7895.2' to another column with ArrayType contains ['1','2','3']

Raphael Roth · Accepted Answer · 2019-08-18 19:21:05Z

1

this is relatively simple with UDF:

val df = Seq("1-1235.0 2-1248.0 3-7895.2").toDF("input")

val extractFirst = udf((s: String) => s.split(" ").map(_.split('-')(0).toInt))

df.withColumn("newCol", extractFirst($"input"))
  .show()

gives

+--------------------+---------+
|               input|   newCol|
+--------------------+---------+
|1-1235.0 2-1248.0...|[1, 2, 3]|
+--------------------+---------+

I could not find an easy soluton with spark internals (other than using split in combination with explode etc and then re-aggregating)

answered Aug 18, 2019 at 19:21

Raphael Roth

27.3k19 gold badges98 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Vrba · Accepted Answer · 2019-08-19 04:30:03Z

1

You can split the string to an array using split function and then you can transform the array using Higher Order Function TRANSFORM (it is available since Sark 2.4) together with substring_index:

import org.apache.spark.sql.functions.{split, expr}

val df = Seq("1-1235.0 2-1248.0 3-7895.2").toDF("stringCol")

df.withColumn("array", split($"stringCol", " "))
  .withColumn("result", expr("TRANSFORM(array, x -> substring_index(x, '-', 1))"))

Notice that this is native approach, no UDF applied.

edited Aug 19, 2019 at 4:30

answered Aug 18, 2019 at 19:51

David Vrba

3,34417 silver badges21 bronze badges

2 Comments

Ged Over a year ago

Incorrect? What if val df = Seq("1-1235.0 55-1248.0 3-7895.2").toDF("stringCol"), e.g. value greater than 9?

David Vrba Over a year ago

@thebluephantom Thanks for pointing out more digit values. I edited the answer by replacing substring to substring_index.

Collectives™ on Stack Overflow

How to split column in Spark Dataframe to multiple columns

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related