The method drop can only take a single Column expression OR one/more string column names to drop. That's why it work for drop(df2.col1) but raises en exception when using drop(df2.col1, df2.col2).
Having these two dataframes as an example:
df1 = spark.createDataFrame([(1, 1), (2, 2)], ["col1", "col2"])
df2 = spark.createDataFrame([(5, 3, "ok"), (2, 2, "ko")], ["col1", "col2", "status"])
You can drop the duplicates columns like this:
- Using list of column names as join condition
df1.join(df2, ["col1", "col2"], "left").show()
#+----+----+------+
#|col1|col2|status|
#+----+----+------+
#| 1| 1| null|
#| 2| 2| ko|
#+----+----+------+
- Using
select expression
df1.join(df2, (df1["col1"] == df2["col1"]) & (df1["col2"] == df2["col2"]), "left")\
.select(
df1["*"],
*[df2[c] for c in df2.columns if c not in ["col1", "col2"]]
).show()
#+----+----+------+
#|col1|col2|status|
#+----+----+------+
#| 1| 1| null|
#| 2| 2| ko|
#+----+----+------+