Assign values of a column in one dataframe to a column in another dataframe

Question

I have 2 spark dataframes:

Location    Date        Date_part   Sector      units   
USA         7/1/2021    7/1/2021    Cars        200     
IND         7/1/2021    7/1/2021    Scooters    180     
COL         7/1/2021    7/1/2021    Trucks      100

Location    Date    Brands  units   values    
UK          null    brand1  400     120       
AUS         null    brand2  450     230       
CAN         null    brand3  150     34

after doing unionByName, I got

Location    Date        Date_part   Sector      Brands  units   values
USA         7/1/2021    7/1/2021    Cars                200     
IND         7/1/2021    7/1/2021    Scooters            180     
COL         7/1/2021    7/1/2021    Trucks              100
UK          null        null                    brand1  400     120
AUS         null        null                    brand2  450     230
CAN         null        null                    brand3  150     34

But my expected dataframe is:

Location    Date        Date_part   Sector      Brands  units   values
USA         7/1/2021    7/1/2021    Cars                200     
IND         7/1/2021    7/1/2021    Scooters            180     
COL         7/1/2021    7/1/2021    Trucks              100
UK          null        7/1/2021                brand1  400     120
AUS         null        7/1/2021                brand2  450     230
CAN         null        7/1/2021                brand3  150     34

I need the values in date_part column to be of dataframe 1 values for all the rows. I tried with this code:

df_result=df_final.select(df_1['date_part'], df_final["*"])

This is creating an extra column date_part. How to acheive my expected dataframe?

This comment will resolve your query. stackoverflow.com/a/30045284/12843137 — Krunal Rajkotiya
– Krunal Rajkotiya, Commented Aug 19, 2021 at 10:59

Steven · Accepted Answer · 2021-08-19 10:46:33Z

2

assuming Date_part is the same for the whole dataframe, there are several solutions to do that - here is one :

from pyspark.sql import functions as F

missing_date = df_result.where(F.col("Date_part").isNotNull()).first()["Date_part"]

df_result = df_result.withColumn("Date_part", F.lit(missing_date))

answered Aug 19, 2021 at 10:46

Steven

15.4k7 gold badges49 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Assign values of a column in one dataframe to a column in another dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related