0

Given a table in postgreSQL such as:

CREATE TABLE events (
    name                    VARCHAR(50),
    time                    TIMESTAMP -- No timezone, supposed UTC
)

I'm inserting events with Spark:

    val timestamp = new Timestamp(1000000000000L)

    val df = Seq(("test",timestamp)).toDF("name", "time")
    
    // Ensure Spark generated the right timestamp
    val timestampInDf = df.collect().head.getAs[Timestamp]("time")
    println(timestampInDf) // 2001-09-09 03:46:40.0   i.e. display for my timezone (Europe/Paris), GMT+2:00
    println(timestampInDf.getTime) // 1000000000000  

    df.write.mode(SaveMode.Append).jdbc(url, tableName, properties)

Then querying the timestamp in postgres:

SELECT name, time, EXTRACT(EPOCH FROM time) AS epoch FROM events

Which returns

name     |time                   |epoch     |
---------+-----------------------+----------+
test2    |2001-09-09 03:46:40.000|1000007200|

There is a 2 hours offset (corresponding to my timezone) with the timestamp I expected to save.

I'd expect the timestamp to be saved based on the epoch time. Instead it looks like Spark (or Postgres) took the display time, then supposed it was in UTC time (it was not), then saved the corresponding epoch time (hence with 7200 additional seconds).

What is the reason for this behavior?
What is a proper way to save a timestamp (without timezone information) with Spark?

1

1 Answer 1

1

This is caused by PostgreSQL driver settings. Use -Duser.timezone=UTC to set your app timezone to UTC to avoid offsets.

More details on PostgreSQL date / time can be found at https://jdbc.postgresql.org/documentation/head/java8-date-time.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.