2

I have a dataframe with the following schema:

root
 |-- docnumber: string (nullable = true)
 |-- event: struct (nullable = false)
 |    |-- data: struct (nullable = true)
           |-- codevent: int (nullable = true)

I need to add a column inside event.data so that the schema would be like:

root
 |-- docnumber: string (nullable = true)
 |-- event: struct (nullable = false)
 |    |-- data: struct (nullable = true)
           |-- codevent: int (nullable = true)
           |-- needtoaddit: int (nullable = true)

I tried

  • dataframe.withColumn("event.data.needtoaddit", lit("added"))
    

    but it adds a column with name event.data.needtoaddit

  • dataframe.withColumn(
      "event",
      struct(
        $"event.*",
        struct(
          lit("added")
            .as("needtoaddit")
        ).as("data")
      )
    )
    

    but it creates an ambiguous column named event.data and again I have a problem.

How can I make it work?

2 Answers 2

3

Spark 3.1+

To add fields inside struct columns, use withField

col("event.data").withField("needtoaddit", lit("added"))

Input:

val df = spark.createDataFrame(Seq(("1", 2)))
    .select(
        col("_1").as("docnumber"),
        struct(struct(col("_2").as("codevent")).as("data")).as("event")
    )
df.printSchema()
// root
//  |-- docnumber: string (nullable = true)
//  |-- event: struct (nullable = false)
//  |    |-- data: struct (nullable = false)
//  |    |    |-- codevent: long (nullable = true)

Script:

val df2 = df.withColumn(
    "event",
    col("event.data").withField("needtoaddit", lit("added"))
)

df2.printSchema()
// root
//  |-- docnumber: string (nullable = true)
//  |-- event: struct (nullable = false)
//  |    |-- data: struct (nullable = true)
//            |-- codevent: int (nullable = true)
//            |-- needtoaddit: int (nullable = true)
Sign up to request clarification or add additional context in comments.

Comments

1

You're kind of close. Try this code:

val df2 = df.withColumn(
    "event", 
    struct(
        struct(
            $"event.data.*", 
            lit("added").as("needtoaddit")
        ).as("data")
    )
)

3 Comments

after teste better it won't work, the event data was losted when i run this code
if i add "event", struct($"event.*", struct($"event.data.*",... it won't work only add one column to data
you should not add event. that will overwrite the new data column with the existing data column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.