How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Question

I receive the error java.lang.IllegalArgumentException: Can't get JDBC type for null when try to run the following example:

    ...
    val spark = SparkSession.builder
    .master("local[*]")
    .appName("Demo")
    .detOrCreate()

    import spark.implicits._

    //load first table
    val df_one = spark.read
    .format("jdbc")
    .option("url",myDbUrl)
    .option("dbtable",myTableOne)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_one.createGlobalTempView("table_one")

    //load second table
    val df_two = spark.read
    .format("jdbc")
    .option("url",myUrl)
    .option("dbtable",myTableTwo)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_two.createGlobalTempView("table_two")

    //perform join of two tables
    val df_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )

//Error there:
    df_result.write
    .format(jdbc)
    .option("dbtable",myResultTable)
    .option("url",myDbUrl)
    .option("user",myUser)
    .option("password",myPassw)
    .mode(SaveMode.Append)
    .save
    ...

I receive the error:

Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for null
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at scala.Option.getOrElse(Option.scala:121)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:147)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:663)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:662)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:662)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
                at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
                at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)

Workaround, which dramastically slows down the workflow:

...
    // create case class for DataSet
    case class ResultCaseClass(field_one: Option[Int], field_two: Option[Int], field_three: Option[Int])

    //perform join of two tables
    val ds_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )
.withColumn("field_one",$"field_one".cast(IntegerType))
.withColumn("field_two",$"field_two".cast(IntegerType))
.withColumn("field_three",$"field_three".cast(IntegerType))
.as[ResultCaseClass]

//Success:
    ds_result.write......
...

Community · Accepted Answer · 2020-06-20 09:12:55Z

12

I encountered the same question as yours.Then I found the error information from java source code. If you insert a null value into a database without specifying the datatype,you will get "Can't get JDBC type for null".The way to fix this problem is casting null to the datatype which is equal to database's filed type.

example:

lit(null).cast(StringType) or lit(null).cast("string")

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Nov 24, 2017 at 11:40

lmnzh

1211 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Siddaram H Over a year ago

import org.apache.spark.sql.types.StringType

Collectives™ on Stack Overflow

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

1 Answer 1

example:

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

example:

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related