1

I have a dataframe that looks like this

          DEP_TIME
0         1851
1         1146
2         2016
3         1350
4          916
          ...
607341     554
607342     633
607343     657
607344     705
607345     628

I need to get every value in this column DEP_TIME to have the format hh:mm. All cells are of type string and can remain that type.

Some cells are only missing the colon (rows 0 to 3), others are also missing the leading 0 (rows 4+). Some cells are empty and should ideally have string value of 0.

I need to do it in an efficient way since I have a few million records. How do I do it?

2 Answers 2

2

Use to_datetime with Series.dt.strftime:

df['DEP_TIME'] = (pd.to_datetime(df['DEP_TIME'], format='%H%M', errors='coerce')
                   .dt.strftime('%H:%M')
                   .fillna('00:00'))

print (df)
       DEP_TIME
0         18:51
1         11:46
2         20:16
3         13:50
4         09:16
607341    05:54
607342    06:33
607343    06:57
607344    07:05
607345    06:28
Sign up to request clarification or add additional context in comments.

Comments

0
import re

d = [['1851'],
['1146'],
['2016'],
 ['916'],
      ['814'],
     [''],
[np.nan]]

df = pd.DataFrame(d, columns=['DEP_TIME'])

df['DEP_TIME'] = df['DEP_TIME'].fillna('0')
df['DEP_TIME'] = df['DEP_TIME'].apply(lambda y: '0' if y=='' else re.sub(r'(\d{1,2})(\d{2})$', lambda x: x[1].zfill(2)+':'+x[2], y))

df
    DEP_TIME
0   18:51
1   11:46
2   20:16
3   09:16
4   08:14
5   0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.