1

My overall goal is to full the hour from each data point to list each beginning time. To do this, I know I need to clean my data so that it is all in a consistent format. I have been trying to use to_datetime and df[time].dt.hour to pull the data needed, but it does not work as the formatting is inconsistent.

This is the data I am working with:

Work Hours
08:15 AM-03:15PM
M,T,W,Th: 7:45AM-3:05PM F:7:45AM-2:07PM
7:45am-3:00pm
7:45AM.-2:15 PM

My current code: df['Work Hours']_dt = pd.to_datetime(df)

I also tried: df['Starting Time'] = df['Work Hours'].dt.hour

My primary concern is to clean the data firstly and eventually I want to extract only the starting time from each workplace so that it looks something like this:

Starting Time
8
7
9
7

1 Answer 1

1

This is a shot in the dark and maybe someone can come up with a better answer you can use regex to substitute patterns for example

regex = r"[a-zA-Z,]"

test_str = "M,T,W,Th: 7:45AM-3:05PM F:7:45AM-2:07PM"

subst = ""

result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

which will make that example string into 7:45-3:05 :7:45-2:07

Then you can split on the : to extract the first hour however word of caution this will return the list [7,45-3,05 ,7,45-2,07] which is fine if you're only looking for the first hour

Have a play about with regex to find the perfect pattern you'd like to match for https://regex101.com/

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.