0
import spark.implicits._
import org.apache.spark.sql.functions._
var names = Seq("ABC","XYZ").toDF("names")
var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=> 
                              (rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))

ERROR: Error:(20, 27) Unable to find encoder for type Char. An implicit Encoder[Char] is needed to store Char instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. var data = names.flatMap(name=>name.getString(0).toCharArray).map(rec=>(rec,1)).rdd.reduce((x,y)=>('S',x._2 + y._2))

2
  • 1
    What is your expected output? What are you trying to do with the code? Commented Jan 9, 2021 at 19:26
  • I am trying to get SUM of all characters, I am able to do it by following code: names.flatMap(name=>name.getString(0).split("")).map(rec=>(rec,1)).rdd.reduce((x,y)=>("SUM",x._2 + y._2)), but instead of using split(""), if I try to use toCharArray it was failing, so tring understand this need of Encoder Commented Jan 10, 2021 at 7:41

1 Answer 1

2

You can convert the dataframe to RDD first before doing the flatMap and map operations:

var data = names.rdd
                .flatMap(name => name.getString(0).toCharArray)
                .map(rec => (rec, 1))
                .reduce((x, y) => ('S', x._2 + y._2))

which will return 6, because you're just counting the number of chars in the first column of the dataframe. Not sure if this is your desired output.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for quick response, but why its failing for dataframe, why its needed to convert in dataframe. I am able to get whats needed by: I am trying to get SUM of all characters, I am able to do it by following code: names.flatMap(name=>name.getString(0).split("")).map(rec=>(rec,1)).rdd.reduce((x,y)=>("SUM",x._2 + y._2)), but instead of using split(""), if I try to use toCharArray it was failing, so tring understand this need of Encoder
In a dataset the values have to be serialized/encoded. If you use split, it will give strings, which belongs to a default Spark data type and can be encoded, but if you use toCharArray, char is not a default Spark datatype and so it cannot be encoded.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.