2

I'm trying to load the query result from one table to another. It connects fine and execute a query to get the metadata but no data is returned.

from pyspark.sql import SQLContext, Row, SparkSession

spark = SparkSession.builder.config("spark.driver.extraClassPath", "C:\\spark\SQL\\sqljdbc_7.0\\enu\\mssql-jdbc-7.0.0.jre10.jar").getOrCreate()

SQL = "Select [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID],[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID],[PackedByPersonID],[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote],[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments],[TotalDryItems],[TotalChillerItems],[DeliveryRun],[RunPosition],[ReturnedDeliveryData],[ConfirmedDeliveryTime],[ConfirmedReceivedBy],[LastEditedBy],[LastEditedWhen] FROM [Sales].[Invoices]"

pgDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://Localhost") \
    .option("query", SQL) \
    .option("user", "dp_admin") \
    .option("Database", "WideWorldImporters") \
    .option("password", "password") \
    .option("fetchsize", 1000) \
    .load(SQL)

pgDF.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://Localhost") \
.option("dbtable", "wwi.Sales_InvoiceLines") \
.option("user", "dp_admin") \
.option("Database", "DW_Staging") \
.option("password", "password") \
.option("mode", "overwrite")

Looking at the sql server profiler :

exec sp_executesql N'SELECT * FROM (Select [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID],[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID],[PackedByPersonID],[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote],[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments],[TotalDryItems],[TotalChillerItems],[DeliveryRun],[RunPosition],[ReturnedDeliveryData],[ConfirmedDeliveryTime],[ConfirmedReceivedBy],[LastEditedBy],[LastEditedWhen] FROM [Sales].[Invoices]) __SPARK_GEN_JDBC_SUBQUERY_NAME_0 WHERE 1=0'

the where 1= 0 gets added and no data is returned, why and how to remove it?

6
  • Hi, can you try without the option fetchsize please ? Commented Jan 14, 2019 at 21:23
  • removing the option fetchsize doesn't change the result - still the same where clause. Apparently the where clause is for spark to read the meta from the table and map the datatype to the dataframe. But I still don't why the dataframe is not populated. Commented Jan 15, 2019 at 5:59
  • Have you try the select from SSMS ? I didn’t know spark but by removing step by step pieces of code you can see what doing wrong. Try to execute select 1,´test’ from query too . Your table Invoices it is not o schema (dbo) or other ? Commented Jan 15, 2019 at 6:13
  • I think you have to load data table and after perform a query. Commented Jan 15, 2019 at 6:21
  • 1
    Try to use .option(‘dbtable ´, query) in place to .option(‘query ´, query). Look at this stackoverflow.com/questions/52487007/… Commented Jan 15, 2019 at 6:25

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.