1

I am trying to create a dataframe with One row whose values are null.

val df = Seq(null,null).toDF("a","b")

Faced issues even if we used null.instanceof also with no success.

val df = Seq(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a","b")

This works but I don't like to specify the type of field mostly it should be string.

4 Answers 4

3

I'm assuming you want a two-column DF, in that case each entry should be a tuple or a case-class. If that's the case, you can also explicitly state the type of the Seq so that you don't have use asInstanceOf:

val df = Seq[(Integer, Integer)]((null, null)).toDF("a","b")
Sign up to request clarification or add additional context in comments.

3 Comments

Can we get nulltype which we get if we use withColumn("a",lit(null)) import org.apache.spark.sql.types.NullType. I mean right know schema is integer and integer. I am loking for null, null df.withColumn("test", lit(null)).printSchema root |-- a: integer (nullable = true) |-- b: integer (nullable = true) |-- test: null (nullable = true)
How is NullType useful? Anyway, I think the best way to achieve that is using an explicit schema, as suggested by @MansoorBabaShaik
NullType makes more sense than String or Integer types. I think that the reason behind the NullType when we use the lit
2

My preferred way is to use Option.empty[A]:

val df = Seq((Option.empty[Int],Option.empty[Int])).toDF("a","b")

Comments

0

Looks like missprint in "asInstanceOf", worked fine for me:

       List(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a").show(false)

Comments

0
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}

object SparkApp extends App {

  val sparkSession: SparkSession = SparkSession.builder()
    .appName("Spark_Test_App")
    .master("local[2]")
    .getOrCreate()

  val schema: StructType = StructType(
    Array(
      StructField("a", IntegerType, nullable = true),
      StructField("b", IntegerType, nullable = true)
    )
  )

  import sparkSession.implicits._
  val nullRDD: RDD[Row] = Seq((null, null)).toDF("a", "b").rdd

  val df: DataFrame = sparkSession.createDataFrame(nullRDD, schema)

  df.printSchema()

  df.show()

  sparkSession.stop()
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.