I am struggling to get the CROSS JOIN of 2 data frame. I am using spark 2.0. How can I implement CROSSS JOIN with 2 data frame.?
Edit:
val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2"))
I am struggling to get the CROSS JOIN of 2 data frame. I am using spark 2.0. How can I implement CROSSS JOIN with 2 data frame.?
Edit:
val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2"))
Call join with the other dataframe without using a join condition.
Have a look at the following example. Given first dataframe of people:
+---+------+-------+------+
| id| name| mail|idArea|
+---+------+-------+------+
| 1| Jack|[email protected]| 1|
| 2|Valery|[email protected]| 1|
| 3| Karl|[email protected]| 2|
| 4| Nick|[email protected]| 2|
| 5| Luke|[email protected]| 3|
| 6| Marek|[email protected]| 3|
+---+------+-------+------+
and second dataframe of areas:
+------+--------------+
|idArea| areaName|
+------+--------------+
| 1|Amministration|
| 2| Public|
| 3| Store|
+------+--------------+
the cross join is simply given by:
val cross = people.join(area)
+---+------+-------+------+------+--------------+
| id| name| mail|idArea|idArea| areaName|
+---+------+-------+------+------+--------------+
| 1| Jack|[email protected]| 1| 1|Amministration|
| 1| Jack|[email protected]| 1| 3| Store|
| 1| Jack|[email protected]| 1| 2| Public|
| 2|Valery|[email protected]| 1| 1|Amministration|
| 2|Valery|[email protected]| 1| 3| Store|
| 2|Valery|[email protected]| 1| 2| Public|
| 3| Karl|[email protected]| 2| 1|Amministration|
| 3| Karl|[email protected]| 2| 2| Public|
| 3| Karl|[email protected]| 2| 3| Store|
| 4| Nick|[email protected]| 2| 3| Store|
| 4| Nick|[email protected]| 2| 2| Public|
| 4| Nick|[email protected]| 2| 1|Amministration|
| 5| Luke|[email protected]| 3| 2| Public|
| 5| Luke|[email protected]| 3| 3| Store|
| 5| Luke|[email protected]| 3| 1|Amministration|
| 6| Marek|[email protected]| 3| 1|Amministration|
| 6| Marek|[email protected]| 3| 2| Public|
| 6| Marek|[email protected]| 3| 3| Store|
+---+------+-------+------+------+--------------+
crossJoin for cross joiningYou might have to enable crossJoin in the spark confs. Example:
spark = SparkSession
.builder
.appName("distance_matrix")
.config("spark.sql.crossJoin.enabled",True)
.getOrCreate()
and use something like this:
df1.join(df2, <condition>)
If the areas data is small you can do it by explode without shuffling:
val df1 = Seq(
(1,"Jack","[email protected]",1),
(2,"Valery","[email protected]",1),
(3,"Karl","[email protected]",2),
(4,"Nick","[email protected]",2),
(5,"Luke","[email protected]",3),
(6,"Marek","[email protected]",3)
).toDF("id","name","mail","idArea")
val arr = array(
Seq(
(1,"Amministration"),
(2,"Public"),
(3,"Store")
)
.map(r => struct(lit(r._1).as("idArea"), lit(r._2).as("areaName"))):_*
)
val cross = df1
.withColumn("d", explode(arr))
.withColumn("idArea", $"d.idArea")
.withColumn("areaName", $"d.areaName")
.drop("d")
df1.show
cross.show
Output
+---+------+-------+------+
| id| name| mail|idArea|
+---+------+-------+------+
| 1| Jack|[email protected]| 1|
| 2|Valery|[email protected]| 1|
| 3| Karl|[email protected]| 2|
| 4| Nick|[email protected]| 2|
| 5| Luke|[email protected]| 3|
| 6| Marek|[email protected]| 3|
+---+------+-------+------+
+---+------+-------+------+--------------+
| id| name| mail|idArea| areaName|
+---+------+-------+------+--------------+
| 1| Jack|[email protected]| 1|Amministration|
| 1| Jack|[email protected]| 2| Public|
| 1| Jack|[email protected]| 3| Store|
| 2|Valery|[email protected]| 1|Amministration|
| 2|Valery|[email protected]| 2| Public|
| 2|Valery|[email protected]| 3| Store|
| 3| Karl|[email protected]| 1|Amministration|
| 3| Karl|[email protected]| 2| Public|
| 3| Karl|[email protected]| 3| Store|
| 4| Nick|[email protected]| 1|Amministration|
| 4| Nick|[email protected]| 2| Public|
| 4| Nick|[email protected]| 3| Store|
| 5| Luke|[email protected]| 1|Amministration|
| 5| Luke|[email protected]| 2| Public|
| 5| Luke|[email protected]| 3| Store|
| 6| Marek|[email protected]| 1|Amministration|
| 6| Marek|[email protected]| 2| Public|
| 6| Marek|[email protected]| 3| Store|
+---+------+-------+------+--------------+