I am trying a convert data type of some columns based on a case class.
val simpleDf = Seq(("James",34,"2006-01-01","true","M",3000.60),
("Michael",33,"1980-01-10","true","F",3300.80),
("Robert",37,"1995-01-05","false","M",5000.50)
).toDF("firstName","age","jobStartDate","isGraduated","gender","salary")
// Output
simpleDf.printSchema()
root
|-- firstName: string (nullable = true)
|-- age: integer (nullable = false)
|-- jobStartDate: string (nullable = true)
|-- isGraduated: string (nullable = true)
|-- gender: string (nullable = true)
|-- salary: double (nullable = false)
Here I wanted to change the datatype of jobStartDate to Timestamp and isGraduated to Boolean. I am wondering if that conversion is possible using the case class?
I am aware this can be done by casting each column but in my case, I need to map the incoming DF based on a case class defined.
case class empModel(firstName:String,
age:Integer,
jobStartDate:java.sql.Timestamp,
isGraduated:Boolean,
gender:String,
salary:Double
)
val newDf = simpleData.as[empModel].toDF
newDf.show(false)
I am getting errors because of the string to timestamp conversation. Is there any workaround?
newDf.withColumn("jobStartDate", to_timestamp($"jobStartDate", "yyyy-MM-dd"))works fine for me (on Spark 2.4).