1

I'm trying to read json into dataset (spark 2.1.1). Unfortunately it doesn't work. And fails with:

Caused by: java.lang.NullPointerException: Null value appeared in non-
nullable field:
- field (class: "scala.Long", name: "age")

Any ideas what am I doing wrong ?

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Long)

val sampleJson = """{"id":"kotek", "pets":[{"name":"miauczek", 
"age":18}, {"name":"miauczek2", "age":9}]}"""

val session = SparkSession.builder().master("local").getOrCreate()
import session.implicits._

val rdd = session.sparkContext.parallelize(Seq(sampleJson))
val ds = session.read.json(rdd).as[Owner].collect()
1
  • I believe this is a bug in spark. If I understand correctly what is happening here. spark is NOT mapping by name for that inner type ("pets"). And he is using sorted order to map those attributes ? So pets.age gets mapped to Pet.name, and while trying to map pets.name -> Pet.age he fails with exception. Anyone can confirm that my understanding is correct and this is spark bug ? Commented Aug 10, 2017 at 9:10

1 Answer 1

3

Usually, if some field can be missing use either Option:

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Option[Long])

or nullable type:

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: java.lang.Long)

But this one indeed looks like a bug. I tested this ins Spark 2.2 and it has been resolved by now. I think that quick workaround is to keep fields sorted by name:

case class Owner(id: String, pets: Seq[Pet])
case class Pet(age: java.lang.Long, name: String)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.