How to select a Pyspark column and append it as new rows in the data frame?

Question

I have a JSON file and I want to do some ETL tasks. I want to extract a column and append its values as new rows in the data frame. for example, if I have a data frame like this:

-----------------------------------------------------------------
|name    |    last    |                  father                 |
-----------------------------------------------------------------
| daniel |  allardice | {'name': 'george', 'last': 'allardice'} |
-----------------------------------------------------------------

I want to turn it to:

----------------------------
|    name    |    last     |
----------------------------
|   daniel   |  allardice  |
----------------------------
|   george   |  allardice  |
----------------------------

How can I do this by UDF in PySpark?

dassum · Accepted Answer · 2019-12-29 08:28:15Z

1

Can you try with the below code

from pyspark.sql import functions as F

df_1 = df.select("name","last");

df_2 = df.select(F.col('father').getItem('name').alias('name'), F.col('father')['last'].alias('last'));

result = df_1.union(df_2);

answered Dec 29, 2019 at 8:28

dassum

5,1422 gold badges30 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel Over a year ago

Thanks. What can i do if instead of JSON it would be a Row() type?

Collectives™ on Stack Overflow

How to select a Pyspark column and append it as new rows in the data frame?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related