I am new to Pyspark.I am looking to convert the below spark SQL to dataframe API
sql("SELECT
t.transaction_category_id,
sum(t.transaction_amount) AS sum_amount,
count(DISTINCT t.user_id) AS num_users
FROM transactions t
JOIN users u USING (user_id)
WHERE t.is_blocked = False
AND u.is_active = 1
GROUP BY t.transaction_category_id
ORDER BY sum_amount DESC").show()
The tables are uneven where the transactions tables is a large table.I am looking if I can apply broadcast join/salting?