1

I want to convert a date column into integer using Spark SQL. I'm following this code, but I want to use Spark SQL and not PySpark.

Reproduce the example:

from pyspark.sql.types import *
import pyspark.sql.functions as F

# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
              ("Michael",33,"1980-01-10","true","F",3300.80),
              ("Robert",37,"1992-07-01","false","M",5000.50)
             ]

columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)

df = df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))
df = df.withColumn("jobStartDateAsInteger1", F.unix_timestamp(df['jobStartDate']))
display(df)

enter image description here

What I want is to do the same transformation, but using Spark SQL. I am using the following code:

df.createOrReplaceTempView("date_to_integer")

%sql
select
seg.*,
CAST (jobStartDate AS INTEGER) as JobStartDateAsInteger2 -- return null value
from date_to_integer seg

enter image description here

How to solve it?

1 Answer 1

1

First you need to CAST your jobStartDate to DATE and then use UNIX_TIMESTAMP to transform it to UNIX integer.

SELECT
    seg.*,
    UNIX_TIMESTAMP(CAST (jobStartDate AS DATE)) AS JobStartDateAsInteger2
FROM date_to_integer seg
Sign up to request clarification or add additional context in comments.

1 Comment

I know they recommend the commen to improve the answer and not to say thanks but Thanks Thanks thanks !!!!!!! really Thanks !! I had lost hope that it could be done

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.