0

Need to join two dataframes in pyspark. One dataframe df1 is like:

city  user_count_city  meeting_session
NYC    100             5
LA     200             10 
....

Another dataframe df2 is like:

total_user_count  total_meeting_sessions
1000               100

Need to calculate user_percentage and meeting_session_percentage so I need a left join, something like

df1 left join df2

How could I join the two dataframes since they do not have common key?

Take a look of solution from this post Joining two dataframes without a common column But this is not same as my case.

Expected results

city user_count_city meeting_session total_user_count total_meeting_sessions
NYC    100             5                     1000     100
LA     200             10                    1000     100
....
3
  • i think if you add monotonicallyIncreasingId and CROSS JOIN on both data frames, would this serve your purpose? Commented Feb 27, 2021 at 3:29
  • even no condition needs to be specified - stackoverflow.com/questions/42158598/… Commented Feb 27, 2021 at 3:30
  • what are you trying to achieve if you don't have any common column. If first Df has 5 rows and second has 2 rows what would be the output that you would expect Commented Feb 27, 2021 at 4:02

1 Answer 1

5

You are looking for a cross join:

result = df1.crossJoin(df2)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.