1

I have a data frame that contains (among others) columns for the time of day (00:00-23:59:59) day (1-7), month (1-12), and year (2000-2019). How can I combine the values of each of these columns on a row by row basis into a new DateTime object and then store these new date-times in a new column? I've read other posts pertaining to such a task but they all seem to involve one date column to one DateTime column whereas I have 4 columns that need to be transformed into DateTime. Any help is appreciated!

e.g.

| 4:30:59 | 1 | 1 | 2000 | TO 200/1/1 4:30:59

this is the only code I have so far which probably doesn't do anything

#creating datetime object (MISC)
data = pd.read_csv('road_accidents_data_clean.csv',delimiter=',')
df = pd.DataFrame(data)
format = '%Y-%m-%d %H:%M:%S'
n = 0
df['datetime'] = data.loc[n,'Crash_Day'],data.loc[n,'Crash_Month'],data.loc[n,'Year']

My DataFrame is layed out as follows:

Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender | 
  0      37   2000         1            1        4:30:59     DRIVER      MALE
  1      42   2000         1            1        7:45:10     DRIVER      MALE
  2      25   2000         1            1        10:15:30   PEDESTRIAN  FEMALE

Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
  UNKNOWN          1              YARRA      MELBOURNE          NaN
 OVERTAKING        1              YARRA      MELBOURNE          NaN
ADJACENT DIR       0              MONASH     MELBOURNE          NaN

NOTE: the dataframe is 13 columns wide i just couldn't fit them all on one line so Crash_Type starts to the right of Gender.

below is the code i've been suggested to use/my adaptation of it

df = pd.DataFrame(dict(
    Crash_Time=['4:30:59','4:20:00'],
    Crash_Day=[1,20],
    Crash_Month=[1,4],
    Year=[2000,2020],

))


data['Datetime'] = df['Datetime']=pd.to_datetime(
    np.sum([
        df['Year'].astype(str),
        '-',
        df['Crash_Month'].astype(str),
        '-',
        df['Crash_Day'].astype(str),
        ' ',
        df['Crash_Time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)

I've adapted this code in order to combine the values for the datetime column with the my original dataframe.

1
  • I could show you how to read in the csv using .read_csv(...., parse_dates={...}), but you don't have much example data to try some things out... Commented Apr 24, 2020 at 0:09

1 Answer 1

1

Combine the columns into a single series of stings using + (converting to str where needed with pandas.Series.astype method) then pass that new series into pd.to_datetime before assigning it to a new column in your df:

import pandas as pd

df = pd.DataFrame(dict(time=['4:30:59'],date=[1],month=[1],year=[2000]))

df['datetime'] = pd.to_datetime(
    df['year'].astype(str)+'-'+df['month'].astype(str)+'-'+df['date'].astype(str)+' '+df['time'],
    format = '%Y-%m-%d %H:%M:%S',
)
print(df)

example in python tutor

edit: You can also use a numpy.sum to make that one long line adding columns together easier on the eyes:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(
    time=['4:30:59','4:20:00'],
    date=[1,20],
    month=[1,4],
    year=[2000,2020],

))

df['datetime']=pd.to_datetime(
    np.sum([
        df['year'].astype(str),
        '-',
        df['month'].astype(str),
        '-',
        df['date'].astype(str),
        ' ',
        df['time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)

sum example in python tutor

edit 2: Using your actual column names, it should be something like this:

import pandas as pd
import numpy as np
'''
Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender | 
  0      37   2000         1            1        4:30:59     DRIVER      MALE

Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
  UNKNOWN          1              YARRA      MELBOURNE          NaN
'''
df = pd.DataFrame(dict(
    Crash_Time=['4:30:59','4:20:00'],
    Crash_Day=[1,20],
    Crash_Month=[1,4],
    Year=[2000,2020],

))

df['Datetime']=pd.to_datetime(
    np.sum([
        df['Year'].astype(str),
        '-',
        df['Crash_Month'].astype(str),
        '-',
        df['Crash_Day'].astype(str),
        ' ',
        df['Crash_Time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)
print(df)

another python tutor link

One thing to note is that you might want to double check if your csv file is separated by just a comma or could it be a comma and a space? possible that you may need to load the data with df = pd.read_csv('road_accidents_data_clean.csv',sep=', ') if there is an extra space separating the data in addition to the comma. You don't want to have that extra space in your data.

Sign up to request clarification or add additional context in comments.

16 Comments

Awesome, Thanks! What do i need to do to iterate this code over each row of my dataframe? like do i need to assign each variable as 'n' and then let n = the row number or something to fill in the values for each row of the new column?
Nothing. It will apply this logic to all rows. See the example I edited in with a second row in the df.
The code returned NaN in all rows of the datetime column- what's the fix for this? Also will this code return the date/time combination for each row of the dataframe without having to enter them manually into the year, month,day,time columns? i have 115k lines so that's not practical. Thanks!
when you print the df after loading the csv file what does it look like? did you change the column names from what I had in my example to match the actual column names of your df? unrelated but you don't need to do df = pd.DataFrame(data), you can just do df = pd.read_csv('road_accidents_data_clean.csv',delimiter=',') since pd.read_csv will return a pandas.DataFrame object anyways. You may need to update your question with a new code example that shows how you are trying to implement my suggestion.
after i execute the code the dataframe loads in as normal except with the new datetime column on the far right-hand side. The datetime column just has NaN for every value from row 0 to row 115k
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.