1

I am trying to join two pyspark dataframes as below based on "Year" and "invoice" columns. But if "Year" is missing in df1, then I need to join just based on ""invoice" alone

df1:

Year    invoice    Status   Item
2020    262        YES      bag
2019    252        YES      ball
2018    240        YES      pen
2017    228        YES      ink
2016    216        NO       headphone
2015    213        NO       bicycle
        198        NO       ribbon
        175        YES      phone
    

df2:

Year    invoice
2020    262
2016    216
2014    175
2013    198
2019    252

Expected output:

Year    invoice    Status   Item
2020    262        YES      bag
2016    216        NO       headphone
2014    175        YES      phone
2013    198        NO       ribbon
2019    252        YES      ball

I am able to join df1 and df2 as below (only based on Year and invoice" column. If year is missing in df1, I need to add the logic of joining two columns based on invoice alone.

df_results = df1.join(df2, on=['Year', 'invoice'], how='left') \
                .drop(df2.Year) \
                .drop(df2.invoice)

Please let me know how to join if "Year" is not available in the df1, and dataframes should be joined based on "invoice" alone. Thanks.

1 Answer 1

2

I don't have your code to test this, but I would try to add a condition to the join operation:

cond = ((df1.Year == df2.Year) | df1.Year.isNull()) & (df1.invoice == df2.invoice)
df_results = df1.join(df2, on=cond, how='left') \
                .drop(df2.Year) \
                .drop(df2.invoice)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @mck Strangely df1.Year == df2.Year is not working for me. They are of integer type. When I manually check, both the columns show the same year.
After joining I get null values for "Status" and "Item" columns. But the values are present in the original df1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.