change multiple columns in pandas dataframe to datetime

Question

I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')

it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like

data[:,7:12]

thanks!

I'm not sure if there is a function to convert multiple columns at the same time but I know that read_csv has a parse_dates argument that can take a list of all the columns you would like to convert when first importing your data. — Ted Petrou
– Ted Petrou, Commented Jan 6, 2017 at 20:57

Henry Ecker · Accepted Answer · 2021-11-14 19:48:46Z

84

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')

edited Nov 14, 2021 at 19:48

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

answered Jan 6, 2017 at 21:01

Ted Petrou

62.4k19 gold badges139 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

David Over a year ago

quick question, in this case wouldn't it be better to use the map() method ?

Ted Petrou Over a year ago

map and apply can both accept functions which is why this confuses users like you. I only use map to do 'literal mapping' with a dictionary/Series. I only use apply for functions. apply has some extra functionality too.

edesz Over a year ago

With errors='coerce', invalid dates are assigned to NaT - the 2nd example from here shows this. If ignore is used instead of coerce, it will ignore the fact that it is an invalid specification and just return a (possibly) invalid/incorrect date.

beshr Over a year ago

I've the same problem, but I only need to change the column name from string (e.g.: 2020-Q4) to datetime without affecting the rows. how can this be done ?

groenhen · Accepted Answer · 2018-01-31 12:57:02Z

47

my_df[['column1','column2']] =     
my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')

Note: of course the format can be changed as required.

edited Jan 31, 2018 at 12:57

groenhen

3,02725 gold badges51 silver badges69 bronze badges

answered Jan 31, 2018 at 12:32

mel el

5214 silver badges6 bronze badges

2 Comments

jvqp Over a year ago

how can I keep the other columns of my dataframe when I do that?

Nick P Over a year ago

@JessicaVoigt left of the = only the selected columns are overwritten; the remaining columns in my_df will remain untouched. You could do my_df[['new_1','new_2']] = my_df[['column1','column2']].apply(...) to assign respectively to new columns.

SerialDev · Accepted Answer · 2017-01-06 21:06:09Z

19

If performance is a concern I would advice to use the following function to convert those columns to date_time:

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])

to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms

Source: https://github.com/sanand0/benchmarks/tree/master/date-parse

answered Jan 6, 2017 at 21:06

SerialDev

2,84722 silver badges34 bronze badges

3 Comments

smishra Over a year ago

The numbers don't look that impressive if the data frame is deep and dates are not normally distributed. I ran the code on my data frame (273771,), the to_datetime time was 1min13 sec vs the lookup was 59 sec.

gies0r Over a year ago

@smishra Most likely this depends on your input data. If there are a lot of duplicates. SerialDev 's approach than is a fast lookup instead of a conversion.

BrianBeing Over a year ago

This looks great. How are we passing the series object as s? df['Date'] = lookup(['Date'])?

smishra · Accepted Answer · 2018-05-23 17:16:34Z

12

If you rather want to convert at load time, you could do something like this

date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)

answered May 23, 2018 at 17:16

smishra

3,48834 silver badges33 bronze badges

Comments

sgDysregulation · Accepted Answer · 2017-12-29 08:07:06Z

2

First you need to extract all the columns your interested in from data then you can use pandas applymap to apply to_datetime to each element in the extracted frame, I assume you know the index of the columns you want to extract, In the code below column names of the third to the sixteenth columns are extracted. you can alternatively define a list and add the names of the columns to it and use that in place, you may also need to pass the date/time format of the the DateTime entries

import pandas as pd

cols_2_extract = data.columns[2:15]

data[cols_2_extract] = data[cols_2_extract].applymap(lambda x : pd.to_datetime(x, format = '%d %M %Y'))

edited Dec 29, 2017 at 8:07

answered Jan 6, 2017 at 21:48

sgDysregulation

4,4372 gold badges26 silver badges32 bronze badges

Comments

ChrisDanger · Accepted Answer · 2021-08-23 19:30:07Z

1

Slightly different from the accepted answer, loc also works:

dx.loc[:,['birth_date','death_date']] = dx.loc[:,['birth_date','death_date']].apply(pd.to_datetime, errors='coerce')

answered Aug 23, 2021 at 19:30

ChrisDanger

1,21512 silver badges12 bronze badges

Comments

StupidWolf · Accepted Answer · 2021-05-08 09:05:50Z

0

data.iloc[:, 7:12]=data.iloc[:, 7:12].astype('datetime64[ns]')

edited May 8, 2021 at 9:05

StupidWolf

47.1k17 gold badges50 silver badges81 bronze badges

answered May 8, 2021 at 8:30

Manuj Arora

311 bronze badge

Comments

Cam · Accepted Answer · 2021-08-30 20:42:46Z

0

read_csv()

Adding to @smishra answer. When importing .csv you can infer dates using infer-datetime-format as discussed here. This can only be used if the series has a consistent date format but will speedup the import of dates.

read_excel()

There is also the read_excel() function that can be used to import and process dates. You can pass the parse_dates parameter a list of columns name or numbers.

parse_dates = [7,8,9,10,11]
data = pd.read_excel('file_to_read.csv', sheet_name='Sheet1', parse_dates=parse_dates)

answered Aug 30, 2021 at 20:42

Cam

1,8651 gold badge23 silver badges34 bronze badges

Collectives™ on Stack Overflow

change multiple columns in pandas dataframe to datetime

8 Answers 8

4 Comments

2 Comments

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

4 Comments

2 Comments

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related