0

I have a dataframe (df) of the following form:

+-----+-----   +
|id   |items   |
+-----+-----   +
|   0 |  item1 |
|   1 |  item2 |
+-----+-----   +

Here first column id is an int and second column items is of type struct. Lets say item is as shown:

     item1
        |-a
        |-b
        |-c
        |-d

I want the resultant table of the form

   +-----+-----   +
   |id   |col2   |
   +-----+-----   +
   |   0 |  a    |
   |   0 |  b    |
   |   0 |  c    |
   |   0 |  d    |
   |   1 |  a    |
   |   1 |  b    |
   |   1 |  c    |
   |   1 |  d    |
   +-----+-----   +

I want to expand struct for every column?
How to do it?

1 Answer 1

1

This peice of code may solve your problem:

df.rdd.flatMap{row=>
val id=row.getInt(0)
val arrayOfString=row.getAs[Array[String]](1)
arrayOfString.map(value=>(id,value)
}.toDF("id","col2")

Note: this code is not tested !

Sign up to request clarification or add additional context in comments.

2 Comments

In this the type of column "items" is struct . How can it be type casted to array[string]
Then remove the array[String] part and put your own value in which you want to take out

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.