2

I have two table having the same schema :

var champs = List(  StructField("nom"    , StringType, true),
                    StructField("heure " , StringType, true),
                    StructField("velo"   , StringType, true),
                    StructField("action" , StringType, true))
var schema = StructType(champs)

I try to join them with classical sql in sparkSQL :

Select  distinct  p.nom, p.velo, p.action, p.heure, r.action, r.heure
from    prises as p, 
        rendus as r
WHERE   p.velo == r.velo

But I get an error :

Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [heure , heure , velo, velo, action, nom, action, nom]; line 2 pos 41;

Is this kind of Query possible in spark ?

I see a lot of pages on which people use [join] method from dataframe. Would that be the only way ?

EDIT 1

val requete = s"""
Select  distinct  p.nom, p.velo, p.action, p.heure, r.action, r.heure
from prises p 
join rendus r
  on (p.velo = r.velo)
"""

sqlContext.sql(requete).show()

gives an error :

Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [action, nom, nom, heure , heure , velo, velo, action]; line 2 pos 43;

EDIT 2

The same for :

val requete = s"""
SELECT DISTINCT p.nom, p.velo, p.action, p.heure, r.action, r.heure 
FROM       prises AS p 
INNER JOIN rendus AS r 
ON p.velo = r.velo
"""
sqlContext.sql(requete).show()

gives an error :

Name: org.apache.spark.sql.AnalysisException
Message: cannot resolve '`p.heure`' given input columns: [action, nom, nom, heure , heure , velo, velo, action]; line 2 pos 41;
2
  • 1
    can't test it atm but it might be confused by the comma join syntax (never use that, write out your joins explicitly) or incorrect double equals in your where clause. Commented Jan 27, 2017 at 14:14
  • @MK. You're right on both counts Commented Jan 27, 2017 at 14:26

3 Answers 3

1

[OK this really shouldn't be an answer but]

You have a trailing space in your column somehow. Look at the error message: some have a space between column name and the comma and some don't.

Also please do use the correct JOIN syntax, comma joins are always a terrible unreadable confusing idea. And SQL uses single equals, not double equals. And <> instead of != while we are at it (even though != is legal in a lot of places, unfortunately).

Sign up to request clarification or add additional context in comments.

1 Comment

trailing space column - my fault :-( You have sharp eyes !
0

As @MK. says, Spark uses explicit JOIN syntax (and single operators for joins)

Try:

Select  distinct  p.nom, p.velo, p.action, p.heure, r.action, r.heure
from prises p 
join rendus r
  on (p.velo = r.velo)

Check the Hive documentation for more info

Comments

0

The query should be:

SELECT DISTINCT p.nom, p.velo, p.action, p.heure, r.action, r.heure 
       FROM prises AS p 
       INNER JOIN rednus AS r 
ON p.velo = r.velo

Notice that the problem is with using ==. It should be =

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.