Converting timezones with Pandas datetime object with an hours offset

Question

I have been pulling my hair out trying to convert a provided time from a MYSQL database in Pandas that has a time offset. I'm provided the following data (MySQL data type is listed as TIMESTAMP(6) WITH TIME ZONE and the label suggests its GMT):

df['times']
0    2021-03-09 21:54:58-08:00
1    2021-03-09 18:42:53-08:00
2    2021-03-09 14:34:19-08:00
3    2021-03-08 21:17:42-08:00
4    2021-03-08 12:48:52-08:00
5    2021-03-08 13:39:28-08:00
6    2021-03-06 22:15:03-08:00
7    2021-03-06 22:15:03-07:00
8    2021-03-05 20:54:02-07:00
9    2021-03-06 14:37:51-07:00

df['times'].iloc[1]
datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

I want to convert this to remove the hours offset, and then ensure it's displayed in my Local Timezone (US/Pacific)

2021-03-09 18:42:53-08:00 = 2021-03-09 10:42:53

What is the most pythonic / Pandas way to do this? The time offset on the end of the data I receive appears to somehow already account for timezone with the offset, but it's not clear to me how to massage the value so it's showing the time in my local timezone.

bbd108 · Accepted Answer · 2021-03-11 02:27:21Z

1

I think this should do the trick for you. See the to_datetime documentation for more information.

s = pd.Series(
    [
        '2021-03-09 21:54:58-08:00',
        '2021-03-09 18:42:53-08:00',
        '2021-03-09 14:34:19-08:00',
        '2021-03-08 21:17:42-08:00',
        '2021-03-08 12:48:52-08:00',
        '2021-03-08 13:39:28-08:00',
        '2021-03-06 22:15:03-08:00',
        '2021-03-06 22:15:03-07:00',
        '2021-03-05 20:54:02-07:00',
        '2021-03-06 14:37:51-07:00'
    ]
)

pd.to_datetime(s, utc=True).dt.tz_convert('US/Pacific').dt.tz_localize(None)

answered Mar 11, 2021 at 2:27

bbd108

9982 gold badges13 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

William Nelson · Accepted Answer · 2021-03-11 04:16:04Z

0

First, the date format you're working with is in ISO 8601 format. Each time is already in a local, non-UTC time zone. It would be very risky to assume that they're actually in UTC, or that you should subtract the offset from the listed time.

Python has built in functions that handle this format very well and will help you avoid making painful time mistakes.

I've attempted to replicate your sample data with the following code, but it's slightly different because I'm not familiar with Pandas.

import datetime
string_date_list =[
    '2021-03-09 21:54:58-08:00',
    '2021-03-09 18:42:53-08:00',
    '2021-03-09 14:34:19-08:00',
    '2021-03-08 21:17:42-08:00',
    '2021-03-08 12:48:52-08:00',
    '2021-03-08 13:39:28-08:00',
    '2021-03-06 22:15:03-08:00',
    '2021-03-06 22:15:03-07:00',
    '2021-03-05 20:54:02-07:00',
    '2021-03-06 14:37:51-07:00'
]
datetime_list = list(map(lambda x: datetime.datetime.fromisoformat(x), string_date_list))

datetime_list[1]

datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

In general, you should avoid removing the offset from a timestamp unless you're converting to UTC, or if you're at the final step where you present a time and date to a user. If you get in the habit of working with "timezone naive" timestamps, it's very easy to accidentally make a conversion twice, and suddenly your data is all off by eight hours, or breaks when Daylight saving time shifts.

Each object in your list is a datetime.datetime object. The relevant built-in function in the datetime library is datetime.astimezone()

To convert a single item in your list: datetime_list[1].astimezone() We still end up with a datetime object, but now it's in your local time zone. For me, I'm in MST, so I get:

datetime.datetime(2021, 3, 9, 19, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'US Mountain Standard Time'))

Once we're certain that it's time to display data to a user, it's safe to perform format changes that remove the time zone with .strftime

print(datetime_list[1].astimezone().strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 19:42:53

To print the entire array, you could do either:

local_datetime_list = list(map(lambda x: print(x.astimezone().strftime("%Y-%m-%d %H:%M:%S")), datetime_list))

or

for time in datetime_list:
    print(time.astimezone().strftime("%Y-%m-%d %H:%M:%S"))

Both return:

2021-03-10 05:54:58
2021-03-10 02:42:53
2021-03-09 22:34:19
2021-03-09 05:17:42
2021-03-08 20:48:52
2021-03-08 21:39:28
2021-03-07 06:15:03
2021-03-07 05:15:03
2021-03-06 03:54:02
2021-03-06 21:37:51

If it's absolutely necessary to always display the time as it'd be seen in US/Pacific, (e.g., a coworker in Japan should still see these times as if they were in California), then you can use pytz.

import pytz
pacific_tz = pytz.timezone('US/Pacific')
print(datetime_list[1].astimezone(pacific_tz).strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 18:42:53

I do note that your question suggests that the expected answer is 10:42:53. If you're 100% sure that that's right. You can do this:

print(datetime_list[1].replace(tzinfo=None) + datetime_list[1].utcoffset())

or, to process the entire list:

remove_offset = list(map(lambda x: print(x.replace(tzinfo=None) + x.utcoffset()), datetime_list))

2021-03-09 13:54:58
2021-03-09 10:42:53
2021-03-09 06:34:19
2021-03-08 13:17:42
2021-03-08 04:48:52
2021-03-08 05:39:28
2021-03-06 14:15:03
2021-03-06 15:15:03
2021-03-05 13:54:02
2021-03-06 07:37:51

answered Mar 11, 2021 at 4:16

William Nelson

6851 gold badge5 silver badges16 bronze badges

4 Comments

Zach Morris Over a year ago

Thank you! I agree that it's not intuitive that the expected answer is in fact just the time with the hours offset (this is predetermined information I've been given from the database owner). As you said, the time is already 'local', but the hours offset listed, which I think in the end tells me the source data timezone is incorrect. The utcoffset function was the key that I was missing.

William Nelson Over a year ago

I didn't mean to second-guess you too aggressively. I just want to make sure if some other person uses this question as a reference in the future, they're very careful. Mistakes with time can result in very challenging bugs.

Zach Morris Over a year ago

It's a fair warning for sure (for anyone else that comes across this answer). Seems like the safest thing is to save everything UTC and just convert when necessary.

William Nelson Over a year ago

That's a very prudent approach. There are situations where it's not ideal though. For example, if you're trying to find the ideal time to send a marketing email, and using response rates to measure efficacy, you really need to measure in local time. Similarly, if you were trying to create a model of typical network activity at a bank branch office, and the big daily data upload is an hour after close, utc won't work.

Collectives™ on Stack Overflow

Converting timezones with Pandas datetime object with an hours offset

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related