How to handle dates in Spark using Scala?

Question

I have a flat file that looks like as mentioned below.

id,name,desg,tdate
1,Alex,Business Manager,2016-01-01

I am using the Spark Context to read this file as follows.

val myFile = sc.textFile("file.txt")

I want to generate a Spark DataFrame from this file and I am using the following code to do so.

case class Record(id: Int, name: String,desg:String,tdate:String)

val myFile1 = myFile.map(x=>x.split(",")).map {
  case Array(id, name,desg,tdate) => Record(id.toInt, name,desg,tdate)
} 

myFile1.toDF()

This is giving me a DataFrame with id as int and rest of the columns as String.

I want the last column, tdate, to be casted to date type.

How can I do that?

mgaido · Accepted Answer · 2016-04-22 07:44:59Z

8

You just need to convert the String to a java.sql.Date object. Then, your code can simply become:

import java.sql.Date
case class Record(id: Int, name: String,desg:String,tdate:Date)

val myFile1 = myFile.map(x=>x.split(",")).map {
  case Array(id, name,desg,tdate) => Record(id.toInt, name,desg,Date.valueOf(tdate))
} 

myFile1.toDF()

answered Apr 22, 2016 at 7:44

mgaido

3,0553 gold badges22 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rahul Over a year ago

Thanks Mark for another prompt reply! It worked for me and this time I got the chance to accept your answer as well :)

Collectives™ on Stack Overflow

How to handle dates in Spark using Scala?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related