Creating Consistent Time Format with Pandas

Question

My overall goal is to full the hour from each data point to list each beginning time. To do this, I know I need to clean my data so that it is all in a consistent format. I have been trying to use to_datetime and df[time].dt.hour to pull the data needed, but it does not work as the formatting is inconsistent.

This is the data I am working with:

Work Hours
08:15 AM-03:15PM
M,T,W,Th: 7:45AM-3:05PM F:7:45AM-2:07PM
7:45am-3:00pm

7:45AM.-2:15 PM

My current code: df['Work Hours']_dt = pd.to_datetime(df)

I also tried: df['Starting Time'] = df['Work Hours'].dt.hour

My primary concern is to clean the data firstly and eventually I want to extract only the starting time from each workplace so that it looks something like this:

Starting Time
8
7
9
7

Swinging Treebranch · Accepted Answer · 2023-02-13 00:58:23Z

1

This is a shot in the dark and maybe someone can come up with a better answer you can use regex to substitute patterns for example

regex = r"[a-zA-Z,]"

test_str = "M,T,W,Th: 7:45AM-3:05PM F:7:45AM-2:07PM"

subst = ""

result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

which will make that example string into 7:45-3:05 :7:45-2:07

Then you can split on the : to extract the first hour however word of caution this will return the list [7,45-3,05 ,7,45-2,07] which is fine if you're only looking for the first hour

Have a play about with regex to find the perfect pattern you'd like to match for https://regex101.com/

answered Feb 13, 2023 at 0:58

Swinging Treebranch

217 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating Consistent Time Format with Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related