Unable to find Encode[Char] while using flatMap with toCharArray in spark

Question

import spark.implicits._
import org.apache.spark.sql.functions._
var names = Seq("ABC","XYZ").toDF("names")
var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=> 
                              (rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))

ERROR: Error:(20, 27) Unable to find encoder for type Char. An implicit Encoder[Char] is needed to store Char instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=>(rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))

What is your expected output? What are you trying to do with the code? — mck
– mck, Commented Jan 9, 2021 at 19:26
I am trying to get SUM of all characters, I am able to do it by following code: names.flatMap(name=>name.getString(0).split("")).map(rec=>(rec,1)).rdd.reduce((x,y)=>("SUM",x._2 + y._2)), but instead of using split(""), if I try to use toCharArray it was failing, so tring understand this need of Encoder — SushantPatade
– SushantPatade, Commented Jan 10, 2021 at 7:41

mck · Accepted Answer · 2021-01-09 19:30:54Z

2

You can convert the dataframe to RDD first before doing the flatMap and map operations:

var data = names.rdd
                .flatMap(name => name.getString(0).toCharArray)
                .map(rec => (rec, 1))
                .reduce((x, y) => ('S', x._2 + y._2))

which will return 6, because you're just counting the number of chars in the first column of the dataframe. Not sure if this is your desired output.

answered Jan 9, 2021 at 19:30

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SushantPatade Over a year ago

Thanks for quick response, but why its failing for dataframe, why its needed to convert in dataframe. I am able to get whats needed by: I am trying to get SUM of all characters, I am able to do it by following code: names.flatMap(name=>name.getString(0).split("")).map(rec=>(rec,1)).rdd.reduce((x,y)=>("SUM",x._2 + y._2)), but instead of using split(""), if I try to use toCharArray it was failing, so tring understand this need of Encoder

mck Over a year ago

In a dataset the values have to be serialized/encoded. If you use split, it will give strings, which belongs to a default Spark data type and can be encoded, but if you use toCharArray, char is not a default Spark datatype and so it cannot be encoded.

Collectives™ on Stack Overflow

Unable to find Encode[Char] while using flatMap with toCharArray in spark

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related