I am creating a Spark application using the Scala binding. But some of my model's (classes) are written in Java. When I try to create a Dataset based on Scala Case Class, it works fine and all the columns are visible when I do show(). But when I create a Dataset based on a Java Class all the columns are packed in a single column named value.
Scala Case Class Example:
case class Person(name: String, age: Int)
Execution:
sqlContext.createDataset(Seq(Person("abcd", 10))).show()
Output:
name | age
abcd | 10
Java Class Example:
class Person {
public String name;
public int age;
public Person (String name, int age) {
this.name = name;
this.age = age;
}
}
Execution:
sqlContext.createDataset(Seq(Person("abcd", 10))).show()
Output:
value
[01 00 63 6F 6D 2...]
Are we not suppose to use Java classes as models with Spark Scala app? How do we resolve this issue?