I have two table having the same schema :
var champs = List( StructField("nom" , StringType, true),
StructField("heure " , StringType, true),
StructField("velo" , StringType, true),
StructField("action" , StringType, true))
var schema = StructType(champs)
I try to join them with classical sql in sparkSQL :
Select distinct p.nom, p.velo, p.action, p.heure, r.action, r.heure
from prises as p,
rendus as r
WHERE p.velo == r.velo
But I get an error :
Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [heure , heure , velo, velo, action, nom, action, nom]; line 2 pos 41;
Is this kind of Query possible in spark ?
I see a lot of pages on which people use [join] method from dataframe. Would that be the only way ?
EDIT 1
val requete = s"""
Select distinct p.nom, p.velo, p.action, p.heure, r.action, r.heure
from prises p
join rendus r
on (p.velo = r.velo)
"""
sqlContext.sql(requete).show()
gives an error :
Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [action, nom, nom, heure , heure , velo, velo, action]; line 2 pos 43;
EDIT 2
The same for :
val requete = s"""
SELECT DISTINCT p.nom, p.velo, p.action, p.heure, r.action, r.heure
FROM prises AS p
INNER JOIN rendus AS r
ON p.velo = r.velo
"""
sqlContext.sql(requete).show()
gives an error :
Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [action, nom, nom, heure , heure , velo, velo, action]; line 2 pos 41;