1

I try to read a table from databricks using the databricks jdbc driver and spark

df = spark.read().format("jdbc")
        .option("url", databricks_url)
        .option("driver", "com.databricks.client.spark.jdbc.SparkJDBC42Driver")
        .option("dbtable", "default.age")
        .load() ;
        df.show();

The full stacktrace :

Exception in thread "main" java.lang.ClassCastException: com.databricks.client.spark.jdbc.SparkJDBC42Driver cannot be cast to java.sql.Driver
    at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:54)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:101)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:101)
    at scala.Option.foreach(Option.scala:437)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:101)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:39)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:33)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
    at scala.Option.getOrElse(Option.scala:201)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
    at SparkTest.main(SparkTest.java:31)

1 Answer 1

2

I have solved this problème like that : By installing the ODBC https://www.databricks.com/fr/spark/odbc-drivers-download while keeping the jdbc jar in the libs, and by removing the driver option from spark read; and by applying some changes on the url :

  • replacing spark by databricks
  • adding these options :

Driver=Simba; UseNativeQuery=0;

String databricks_url = "jdbc:databricks://adb-xxxxx.azuredatabricks.net:443/default;
transportMode=http;
ssl=1;
httpPath=sql/protocolv1/o/....;
AuthMech=3;
UID=token;
PWD=<my databricks_token>;
Driver=Simba;
UseNativeQuery=0;
";

In pyspark the query myst be between simple quote : '

df = spark.read.format("jdbc").option("url", databricks_url) \
.option("query", 'SELECT * FROM table') \
.load()
df.show(vertical=True)
Sign up to request clarification or add additional context in comments.

2 Comments

Interesting, I used double quotes " for query option...
UseNativeQuery=0; did the trick thanks - was getting "Error converting value to Timestamp"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.