0

I'm trying to change the format of a processed_time field in my DataFrame.

Originally it looks like this: 2017-05-12 11:33:50 -0700 and I want to format it to "yyyy-MM-dd HH:mm:ss" (2017-05-12 11:33:50)

However after formating using the approach shown below the value gets a zero after seconds 2017-05-12 11:33:50.0 I guess it relates to a timezone. How is it possible achieve the format without zero in the end?

    .withColumn("processed_time",
            to_utc_timestamp(unix_timestamp(col("processed_time")).cast(TimestampType),
                    "UTC"))

1 Answer 1

2

After transformation, the column processed_time on your DataFrame is of type TimestampType. Therefore, the column values are of type java.sql.Timestamp.

The trailing zero that you see is the number of nanoseconds (because java.sql.Timestamp precision allows it). It's just here because when doing your_df.show(), the method toString is called on java.sql.Timestamp.

If you just want to have your result formatted (but as a String), you can add .cast(StringType) when modifying your processed_time column :

df.withColumn(
    "processed_time",
    to_utc_timestamp(
        unix_timestamp(col("processed_time")).cast(TimestampType),
        "UTC"
    ).cast(StringType)
)

You can also use date_format, as written in the comments :

df.withColumn(
    "processed_time",
    date_format(
        to_utc_timestamp(
            unix_timestamp(col("processed_time")).cast(TimestampType),
            "UTC"
        ),
        "yyyy-MM-dd HH:mm:ss"
    )
)

If you really need a TimestampType, then you can just forget about the trailing zero during your transforms, and then just use a SimpleDateFormat afterwards for display :

val firstTimestampFromDf: java.sql.Timestamp = df
    .select("processed_time")
    .head
    .getTimestamp(0)

import java.text.SimpleDateFormat

val simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
val firstTimestampFromDfFormatted = simpleDateFormat.format(firstTimestampFromDf) 
Sign up to request clarification or add additional context in comments.

3 Comments

I've tried your approach but for some reason when I do .withColumn("processed_time", to_utc_timestamp(unix_timestamp(col("processed_time")).cast(StringType), "UTC")) there is NULL value in the column
Actually I don't really need a timestamp here. When I needed a formatted current timestamp as a StringType it was easy to achieve with date_format: .withColumn("processed_time", date_format(lit(current_timestamp()), "yyyy-MM-dd HH:mm:ss"))
@samba I've just updated my answer with an example of .cast(StringType). In your first comment, you simply did not put .cast(StringType) at the right place. If the answer solves your problem, don't forget to accept the answer to help other people facing similar issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.