3

I'm trying to perform a simple map on a Dataset[Row] (DataFrame) in Spark 2.0.0. Something as simple as this

val df: DataSet[Row] = ...
df.map { r: Row => r }

But the compiler is complaining that I'm not providing the implicit Encoder[Row] argument to the map function:

not enough arguments for method map: (implicit evidence$7: Encoder[Row]).

Everything works fine if I convert to an RDD first ds.rdd.map { r: Row => r } but shouldn't there be an easy way to get an Encoder[Row] like there is for tuple types Encoders.product[(Int, Double)]?

[Note that my Row is dynamically sized in such a way that it can't easily be converted into a strongly-typed Dataset.]

3 Answers 3

3

SSry to be a "bit" late. Hopefully this helps to someone who is hitting the problem right now. Easiest way to define encoder is deriving the structure from existing DataFrame:

val df = Seq((1, "a"), (2, "b"), (3, "c").toDF("id", "name")
val myEncoder = RowEndocer(df.schema)

Such approach could be useful when you need altering existing fields from your original DataFrame.

If you're dealing with completely new structure, explicit definition relying on StructType and StructField (as suggested in @Reactormonk 's little cryptic response).

Example defining the same encoder:

val myEncoder2 = RowEncoder(StructType(
  Seq(StructField("id", IntegerType), 
      StructField("name", StringType)
  )))

Please remember org.apache.spark.sql._, org.apache.spark.sql.types._ and org.apache.spark.sql.catalyst.encoders.RowEncoder libraries have to be imported.

Sign up to request clarification or add additional context in comments.

1 Comment

Instead of RowEncoder(df.schema) you can simply use df.encoder.
2

In your specific case where the mapped function does not change the schema, you can pass in the encoder of the DataFrame itself:

df.map(r => r)(df.encoder)

Comments

1

An Encoder needs to know how to pack the elements inside the Row. So you could write your own Encoder[Row] by using row.structType which determines the elements of your Row at runtime and uses the corresponding decoders.

Or if you know more about the data that goes into Row, you could use https://github.com/adelbertc/frameless/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.