1

The question is a follow-up of How to store custom objects in Dataset?

Spark version: 3.0.1

Non-nested custom type is achievable:

import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}

class AnObj(val a: Int, val b: String)

implicit val myEncoder: Encoder[AnObj] = Encoders.kryo[AnObj] 

val d = spark.createDataset(Seq(new AnObj(1, "a")))

d.printSchema
root
 |-- value: binary (nullable = true)

However, if the custom type is nested inside a product type (i.e. case class), it gives an error:

java.lang.UnsupportedOperationException: No Encoder found for InnerObj

import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}

class InnerObj(val a: Int, val b: String)
case class MyObj(val i: Int, val j: InnerObj)

implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj] 

// error
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))
// it gives Runtime error: java.lang.UnsupportedOperationException: No Encoder found for InnerObj

How can we create Dataset with nested custom type?

2 Answers 2

3

Adding the encoders for both MyObj and InnerObj should make it work.

  class InnerObj(val a:Int, val b: String)
  case class MyObj(val i: Int, j: InnerObj)

  implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
  implicit val objEncoder: Encoder[MyObj] = Encoders.kryo[MyObj]

The above snippet compile and run fine

Sign up to request clarification or add additional context in comments.

1 Comment

That's embarrassing! I thought with import spark.implicits._ and case class, I don't need to create the encoder for it. But it seems as long as there's nested custom type, I still need to create one for case class. Thanks for the answer
1

Another solution apart from sujesh's:

import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}

class InnerObj(val a: Int, val b: String)
case class MyObj[T](val i: Int, val j: T)

implicit val myEncoder: Encoder[MyObj[InnerObj]] = Encoders.kryo[MyObj[InnerObj]] 

// works
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))

This also shows a difference between the case where the inner type can be deduced from the type parameter, and the case where it cannot be deduced.

The former case should be done:

implicit val myEncoder: Encoder[MyObj[InnerObj]] = Encoders.kryo[MyObj[InnerObj]]

The later case should be done:

implicit val myEncoder1: Encoder[InnerObj] = Encoders.kryo[InnerObj]
implicit val myEncoder2: Encoder[MyObj] = Encoders.kryo[MyObj]

1 Comment

more precise. would be good pattern when you have type classes and more abstractions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.