1

I'm not sure if I'm in the correct group for this question. Any I have created the following sql code in Databricks, however I'm getting the error message;

Error in SQL statement: AnalysisException: cannot resolve 'a.COUNTRY_ID' given input columns: [a."PK_LOYALTYACCOUNT";"COUNTRY_ID";"CDC_TYPE", b."PK_LOYALTYACCOUNT";"COUNTRY_ID";"CDC_TYPE"]; line 7 pos 7;

I know the code works as I have successfully run the code on my SQL Server The code is as follows:

tabled = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)
tablee = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tablee.csv",inferSchema=True,header=True)
tabled.createOrReplaceTempView('tabled') 
tablee.createOrReplaceTempView('tablee')
%sql
; with cmn as 
  ( SELECT a.CDC_TYPE,
           a. PK_LOYALTYACCOUNT, --Add these also in CTE result set 
           a.COUNTRY_ID --Add these also in CTE result set 
    FROM  tabled  a 
    INNER JOIN tablee b 
    ON a.COUNTRY_ID = b.COUNTRY_ID 
    AND a.PK_LOYALTYACCOUNT = b.PK_LOYALTYACCOUNT 
    AND a.CDC_TYPE = 'U'
    )
 SELECT 1 AS is_deleted, 
        a.* 
 FROM  tabled  a 
 INNER JOIN cmn 
 ON a.CDC_TYPE = cmn.CDC_TYPE 
 and  a.COUNTRY_ID = cmn.COUNTRY_ID 
 AND a.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
 UNION ALL 
 SELECT 0 AS is_deleted, 
        b.* 
 FROM tablee  b 
 INNER JOIN cmn 
 ON b.CDC_TYPE = cmn.CDC_TYPE 
 and b.COUNTRY_ID = cmn.COUNTRY_ID 
 AND b.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
UNION ALL 
SELECT NULL, 
       a.* 
FROM   tabled a 
WHERE  a.CDC_TYPE = 'N' 
UNION ALL 
SELECT NULL, 
       b.* 
FROM   tablee b 
WHERE  b.CDC_TYPE = 'N'

when I run the simple query...

example1 =

spark.sql("""select * from tablee""") 

or example2 =

spark.sql("""select * from tabled""") 

I get the following output, so I know the tables are there

output

Any suggestions will be well received.

2
  • 1
    What error (if any) do you see for this query? SELECT a.COUNTRY_ID FROM tabled a; Is there actually a COUNTRY_ID field in that table? Commented Dec 23, 2018 at 16:36
  • 1
    @JonJaussi, that was good observation. When I run the command SELECT a.COUNTRY_ID FROM tabled a; I don't see the COUNTRY_ID field as you suggested. However, this is standard sql (even though its running on Databricks). And it works fine when I run the same command on my SQL Server. Commented Dec 23, 2018 at 16:47

2 Answers 2

1

Use semicolon delimiter while reading from csv

tabled = spark.read.option("delimiter", ";").csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)

or

tabled = spark.read.load("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",
                 format="csv", sep=";", inferSchema="true", header="true")

ref: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Sign up to request clarification or add additional context in comments.

Comments

0

The columns were not being identified properly since the delimiter used was a semicolon(;) and the job was looking for commas. Problem solved

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.