0

I have a date column with string datatype when inferred in pyspark:

Mon Oct 17 15:57:48 EST 2022

How to cast string datatype as datetime?

1 Answer 1

1

you can use the required datetime formatters - 'E MMM dd HH:mm:ss z yyyy'. the resulting timestamp will be in UTC and, thus, you'll see that it will add 5 hours to the source ts.

spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')

spark.sparkContext.parallelize([('Mon Oct 17 15:57:48 EST 2022', )]).toDF(['dt_str']). \
    withColumn('dt', func.to_timestamp('dt_str', 'E MMM dd HH:mm:ss z yyyy')). \
    show(truncate=False)

# +----------------------------+-------------------+
# |dt_str                      |dt                 |
# +----------------------------+-------------------+
# |Mon Oct 17 15:57:48 EST 2022|2022-10-17 20:57:48|
# +----------------------------+-------------------+
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your input!. I'm trying to convert to datetime instead of timestamp. Is it possible? Because when I try to convert, it is showing as not supported.
@Anos - datetime means timestamp. do you mean just the date part? use the to_date function instead of to_timestamp.
Sorry for the confusion. I thought we have separate datetime datatype conversion. So, I get to know datetime ideally means timestamp datatype. Thanks!
When I try to convert this Wed Oct 19 00:15:13 EST 2022, the resulting timestamp is 2022-10-19 10:45:13. Not really getting what is happening. It is not adding +5
@Anos - it must be because your default tz is set to be something other than UTC, and spark is adding the offset based on that tz. see this to reset your session's tz

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.