1

I'm trying to select data from MSSQL database via SQLContext.sql in Spark application. Connection works but I'm not able to select data from table, because it always fail on table name.

Here is my code:

val prop=new Properties()
  val url2="jdbc:jtds:sqlserver://servername;instance=MSSQLSERVER;user=sa;password=Pass;"
  prop.setProperty("user","username")
  prop.setProperty("driver" , "net.sourceforge.jtds.jdbc.Driver")
  prop.setProperty("password","mypassword")
  val test=sqlContext.read.jdbc(url2,"[dbName].[dbo].[Table name]",prop)

sqlContext.sql("""
SELECT *
FROM 'dbName.dbo.Table name'
                 """)

I tried table name without (') or [dbName].[dbo].[Table name] but still the same ....

Exception in thread "main" java.lang.RuntimeException: [3.14] failure: ``union'' expected but `.' found

dependencies:

// https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" //%"provided"

// https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.10
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.6.0"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" //%"provided"

1 Answer 1

0

I think the problem in your code is that the query you pass to sqlContext has no access to the original table in the source database. It has only access to the tables saved within the sql context, for example with df.write.saveAsTable() or with df.registerTempTable() (df.createTempView in Spark 2+).

So, in your specific case, I can suggest a couple of options:

1) if you want the query to be executed on the source database with the exact syntax of your database SQL, you can pass the query to the "dbtable" argument:

val query = "SELECT * FROM dbName.dbo.TableName"
val df = sqlContext.read.jdbc(url2, s"($query) AS subquery", prop)

df.show

Note that the query needs to be in parentheses, because it will be passed to a "FROM" clause, as specified in the docs:

dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

2) If you don't need to run the query on the source database, you can just pass the table name and then create a temp view in the sqlContext:

val table = sqlContext.read.jdbc(url2, "dbName.dbo.TableName", prop)
table.registerTempTable("temp_table")

val df = sqlContext.sql("SELECT * FROM temp_table")
// or sqlContext.table("temp_table")
df.show()
Sign up to request clarification or add additional context in comments.

4 Comments

I tried option 1 ... when I use jut table name it is working but if I try to execute query on source database it fail ... val pushdown_query = "SELECT * FROM [test].[dbo].[Table Name]" val df = spark.read.jdbc(url=jdbcUrl,s"($pushdown_query)", properties=connectionProperties) I used documentation here docs.databricks.com/spark/latest/data-sources/… Exception is java.sql.SQLException: Incorrect syntax near the keyword 'WHERE'
@TomášTonhajzer Can you show the query which contains the "where" keyword?
That is the point there is no query with WHERE keywork. For testing purpose I used just query for selection the same table as via table name `val pushdown_query = "Select * from (SELECT * FROM [test].[dbo].[Table name]"'
@TomášTonhajzer It may be missing an alias: val df = sqlContext.read.jdbc(url2, s"($query) AS subquery", prop)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.