0

I have been pulling my hair out trying to convert a provided time from a MYSQL database in Pandas that has a time offset. I'm provided the following data (MySQL data type is listed as TIMESTAMP(6) WITH TIME ZONE and the label suggests its GMT):

df['times']
0    2021-03-09 21:54:58-08:00
1    2021-03-09 18:42:53-08:00
2    2021-03-09 14:34:19-08:00
3    2021-03-08 21:17:42-08:00
4    2021-03-08 12:48:52-08:00
5    2021-03-08 13:39:28-08:00
6    2021-03-06 22:15:03-08:00
7    2021-03-06 22:15:03-07:00
8    2021-03-05 20:54:02-07:00
9    2021-03-06 14:37:51-07:00

df['times'].iloc[1]
datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

I want to convert this to remove the hours offset, and then ensure it's displayed in my Local Timezone (US/Pacific)

2021-03-09 18:42:53-08:00 = 2021-03-09 10:42:53

What is the most pythonic / Pandas way to do this? The time offset on the end of the data I receive appears to somehow already account for timezone with the offset, but it's not clear to me how to massage the value so it's showing the time in my local timezone.

2 Answers 2

1

I think this should do the trick for you. See the to_datetime documentation for more information.

s = pd.Series(
    [
        '2021-03-09 21:54:58-08:00',
        '2021-03-09 18:42:53-08:00',
        '2021-03-09 14:34:19-08:00',
        '2021-03-08 21:17:42-08:00',
        '2021-03-08 12:48:52-08:00',
        '2021-03-08 13:39:28-08:00',
        '2021-03-06 22:15:03-08:00',
        '2021-03-06 22:15:03-07:00',
        '2021-03-05 20:54:02-07:00',
        '2021-03-06 14:37:51-07:00'
    ]
)

pd.to_datetime(s, utc=True).dt.tz_convert('US/Pacific').dt.tz_localize(None) 
Sign up to request clarification or add additional context in comments.

Comments

0

First, the date format you're working with is in ISO 8601 format. Each time is already in a local, non-UTC time zone. It would be very risky to assume that they're actually in UTC, or that you should subtract the offset from the listed time.

Python has built in functions that handle this format very well and will help you avoid making painful time mistakes.

I've attempted to replicate your sample data with the following code, but it's slightly different because I'm not familiar with Pandas.

import datetime
string_date_list =[
    '2021-03-09 21:54:58-08:00',
    '2021-03-09 18:42:53-08:00',
    '2021-03-09 14:34:19-08:00',
    '2021-03-08 21:17:42-08:00',
    '2021-03-08 12:48:52-08:00',
    '2021-03-08 13:39:28-08:00',
    '2021-03-06 22:15:03-08:00',
    '2021-03-06 22:15:03-07:00',
    '2021-03-05 20:54:02-07:00',
    '2021-03-06 14:37:51-07:00'
]
datetime_list = list(map(lambda x: datetime.datetime.fromisoformat(x), string_date_list))

datetime_list[1]
datetime.datetime(2021, 3, 9, 18, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))

In general, you should avoid removing the offset from a timestamp unless you're converting to UTC, or if you're at the final step where you present a time and date to a user. If you get in the habit of working with "timezone naive" timestamps, it's very easy to accidentally make a conversion twice, and suddenly your data is all off by eight hours, or breaks when Daylight saving time shifts.

Each object in your list is a datetime.datetime object. The relevant built-in function in the datetime library is datetime.astimezone()

To convert a single item in your list: datetime_list[1].astimezone() We still end up with a datetime object, but now it's in your local time zone. For me, I'm in MST, so I get:

datetime.datetime(2021, 3, 9, 19, 42, 53, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'US Mountain Standard Time'))

Once we're certain that it's time to display data to a user, it's safe to perform format changes that remove the time zone with .strftime

print(datetime_list[1].astimezone().strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 19:42:53

To print the entire array, you could do either:

local_datetime_list = list(map(lambda x: print(x.astimezone().strftime("%Y-%m-%d %H:%M:%S")), datetime_list))

or

for time in datetime_list:
    print(time.astimezone().strftime("%Y-%m-%d %H:%M:%S"))

Both return:

2021-03-10 05:54:58
2021-03-10 02:42:53
2021-03-09 22:34:19
2021-03-09 05:17:42
2021-03-08 20:48:52
2021-03-08 21:39:28
2021-03-07 06:15:03
2021-03-07 05:15:03
2021-03-06 03:54:02
2021-03-06 21:37:51

If it's absolutely necessary to always display the time as it'd be seen in US/Pacific, (e.g., a coworker in Japan should still see these times as if they were in California), then you can use pytz.

import pytz
pacific_tz = pytz.timezone('US/Pacific')
print(datetime_list[1].astimezone(pacific_tz).strftime("%Y-%m-%d %H:%M:%S"))

2021-03-09 18:42:53

I do note that your question suggests that the expected answer is 10:42:53. If you're 100% sure that that's right. You can do this:

print(datetime_list[1].replace(tzinfo=None) + datetime_list[1].utcoffset())

or, to process the entire list:

remove_offset = list(map(lambda x: print(x.replace(tzinfo=None) + x.utcoffset()), datetime_list))
2021-03-09 13:54:58
2021-03-09 10:42:53
2021-03-09 06:34:19
2021-03-08 13:17:42
2021-03-08 04:48:52
2021-03-08 05:39:28
2021-03-06 14:15:03
2021-03-06 15:15:03
2021-03-05 13:54:02
2021-03-06 07:37:51

4 Comments

Thank you! I agree that it's not intuitive that the expected answer is in fact just the time with the hours offset (this is predetermined information I've been given from the database owner). As you said, the time is already 'local', but the hours offset listed, which I think in the end tells me the source data timezone is incorrect. The utcoffset function was the key that I was missing.
I didn't mean to second-guess you too aggressively. I just want to make sure if some other person uses this question as a reference in the future, they're very careful. Mistakes with time can result in very challenging bugs.
It's a fair warning for sure (for anyone else that comes across this answer). Seems like the safest thing is to save everything UTC and just convert when necessary.
That's a very prudent approach. There are situations where it's not ideal though. For example, if you're trying to find the ideal time to send a marketing email, and using response rates to measure efficacy, you really need to measure in local time. Similarly, if you were trying to create a model of typical network activity at a bank branch office, and the big daily data upload is an hour after close, utc won't work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.