0

I am trying to parse a Unix timestamp using pd.to_datetime() vs. dt.datetime.fromtimestamp(), but their outputs are different. Which one is correct?

import datetime as dt
import pandas as pd

ts = 1674853200000
print(pd.to_datetime(ts, unit='ms'))
print(dt.datetime.fromtimestamp(ts / 1e3))

>> 2023-01-27 21:00:00
>> 2023-01-27 13:00:00
4
  • I don't reproduce your output, both give me 2023-01-27 21:00:00 Commented Jan 26, 2023 at 4:58
  • @mozway check the time zone of the machine you run this code on. Is it UTC or UTC+0 ? In contrast to pandas (numpy) datetime, vanilla Python naive datetime defaults to local time. Commented Jan 26, 2023 at 5:45
  • @FObersteiner no it's not UTC Commented Jan 26, 2023 at 5:48
  • @mozway I've added an illustration with some code. If you cannot reproduce that on your machine, let me know. Commented Jan 26, 2023 at 7:02

2 Answers 2

1

In contrast to pandas (numpy) datetime, vanilla Python datetime defaults to local time if you to not specify a time zone or UTC (= use naive datetime). Here's an illustration. If I reproduce your example in my Python environment, I get

from datetime import datetime, timezone
import pandas as pd

# ms since the Unix epoch, 1970-01-01 00:00 UTC
unix = 1674853200000 

dt_py = datetime.fromtimestamp(unix/1e3)
dt_pd = pd.to_datetime(unix, unit="ms")

print(dt_py, dt_pd)
# 2023-01-27 22:00:00 # from fromtimestamp
# 2023-01-27 21:00:00 # from pd.to_datetime

Comparing the datetime objects with my local time UTC offset, there's the difference:

# my UTC offset at that point in time:
print(dt_py.astimezone().utcoffset())
# 1:00:00

# difference between dt_py and dt_pd:
print(dt_py-dt_pd)
# 0 days 01:00:00

To get consistent results between pandas and vanilla Python, i.e. avoid the ambiguity, you can use aware datetime:

dt_py = datetime.fromtimestamp(unix/1e3, tz=timezone.utc)
dt_pd = pd.to_datetime(unix, unit="ms", utc=True)

print(dt_py, dt_pd)
# 2023-01-27 21:00:00+00:00 
# 2023-01-27 21:00:00+00:00

print(dt_py-dt_pd)
# 0 days 00:00:00
Sign up to request clarification or add additional context in comments.

1 Comment

Ah ha, the timezone! Love your answer, so concise and straight to the point. Thank you very much!
0

Both are correct. The main difference between them is that pd.to_datetime() is more flexible and can handle missing input data, while dt.datetime.fromtimestamp() assumes the input timestamp is in the local time zone. Generally, the choice of which one to use depends on the requirements of your use-case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.