3

I'm using Spark Datasets to read in csv files. I wanted to make a polymorphic function to do this for a number of files. Here's the function:

def loadFile[M](file: String):Dataset[M] = {
    import spark.implicits._
    val schema = Encoders.product[M].schema
    spark.read
      .option("header","false")
      .schema(schema)
      .csv(file)
      .as[M]
}

The errors that I get are:

[error] <myfile>.scala:45: type arguments [M] do not conform to method product's type parameter bounds [T <: Product]
[error]     val schema = Encoders.product[M].schema
[error]                                  ^
[error] <myfile>.scala:50: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
[error]       .as[M]
[error]          ^
[error] two errors found

I don't know what to do about the first error. I tried adding the same variance as the product definition (M <: Product), but then I get the error "No TypeTag available for M"

If I pass in the schema already produced from the encoder, I then get the error:

[error] Unable to find encoder for type stored in a Dataset 

1 Answer 1

3

You need to require anyone calling loadFile[M] to provide evidence that there is such an encoder for M. You can do this by using context bounds on M which requires an Encoder[M]:

def loadFile[M : Encoder](file: String): Dataset[M] = {
  import spark.implicits._
  val schema = implicitly[Encoder[M]].schema
  spark.read
   .option("header","false")
   .schema(schema)
   .csv(file)
   .as[M]
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That definitely compiled, but I had some access problems and out of memory problem running my program, even if I don't call the function. I assume I can make my case class extend Encoder and it should work if I didn't have these other runtime problems?
@kim This is a compile time requirement, this shouldn't affect the runtime at all. Perhaps something else is causing your code to OOM.
I decided to get around the whole Encoder problem by not using Spark, but I did find this issue, which talks about encoders for custom objects. I'll come back to figuring it out when I have some time. I'll mark this as my answer though since it got me on the right track.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.