6

I am trying to 'lift' the fields of a struct to the top level in a dataframe, as illustrated by this example:

case class A(a1: String, a2: String)
case class B(b1: String, b2: A)

val df = Seq(B("X",A("Y","Z"))).toDF

df.show    
+---+-----+
| b1|   b2|
+---+-----+
|  X|[Y,Z]|
+---+-----+

df.printSchema
root
 |-- b1: string (nullable = true)
 |-- b2: struct (nullable = true)
 |    |-- a1: string (nullable = true)
 |    |-- a2: string (nullable = true)

val lifted = df.withColumn("a1", $"b2.a1").withColumn("a2", $"b2.a2").drop("b2")

lifted.show
+---+---+---+
| b1| a1| a2|
+---+---+---+
|  X|  Y|  Z|
+---+---+---+

lifted.printSchema
 root
 |-- b1: string (nullable = true)
 |-- a1: string (nullable = true)
 |-- a2: string (nullable = true)

This works. I would like to create a little utility method which does this for me, probably through pimping DataFrame to enable something like df.lift("b2").

To do this, I think I want a way of obtaining a list of all fields within a Struct. E.g. given "b2" as input, return ["a1","a2"]. How do I do this?

2 Answers 2

10

If I understand your question correctly, you want to be able to list the nested fields of column b2.

So you would need to filter on b2, access the StructType of b2 and then map the names of the columns from within the fields (StructField):

import org.apache.spark.sql.types.StructType

val nested_fields = df.schema
                   .filter(c => c.name == "b2")
                   .flatMap(_.dataType.asInstanceOf[StructType].fields)
                   .map(_.name)

// nested_fields: Seq[String] = List(a1, a2)
Sign up to request clarification or add additional context in comments.

2 Comments

You can skip the filter by using df.schema("b2").
pyspark version if anyone needs it: next(col for col in df.schema if col.name==colName).dataType.fieldNames()
2

Actually you can use ".fieldNames.toList".

val nested_fields = df.schema("b2").fieldNames.toList

It returns a list of String. If you want a list of columns make a map.

I hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.