I'm writing a Java application. I have a spark Dataset<MyObject> that results in a binary type column:
Dataset<MyObject> dataset = sparkSession.createDataset(someRDD, Encoders.javaSerialization(MyObject.class));
dataset.printSchema();
//root
//|-- value: binary (nullable = true)
MyObject has different (nested) fields, and I want to "explode" them in multiple columns in my Dataset. The new columns also need to be computed from multiple attributes in MyObject. As a solution, I could use .withColumn() and apply a UDF. Unfortunately, I don't know how to accept a binary type in the UDF and then convert it to MyObject. Any suggestions on how to do that?
byte[]as input type for the UDF. And see this post for how to return complex type from the UDF.