62

I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')

it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like

data[:,7:12]

thanks!

1
  • 1
    I'm not sure if there is a function to convert multiple columns at the same time but I know that read_csv has a parse_dates argument that can take a list of all the columns you would like to convert when first importing your data. Commented Jan 6, 2017 at 20:57

8 Answers 8

84

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')
Sign up to request clarification or add additional context in comments.

4 Comments

quick question, in this case wouldn't it be better to use the map() method ?
map and apply can both accept functions which is why this confuses users like you. I only use map to do 'literal mapping' with a dictionary/Series. I only use apply for functions. apply has some extra functionality too.
With errors='coerce', invalid dates are assigned to NaT - the 2nd example from here shows this. If ignore is used instead of coerce, it will ignore the fact that it is an invalid specification and just return a (possibly) invalid/incorrect date.
I've the same problem, but I only need to change the column name from string (e.g.: 2020-Q4) to datetime without affecting the rows. how can this be done ?
47
my_df[['column1','column2']] =     
my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')

Note: of course the format can be changed as required.

2 Comments

how can I keep the other columns of my dataframe when I do that?
@JessicaVoigt left of the = only the selected columns are overwritten; the remaining columns in my_df will remain untouched. You could do my_df[['new_1','new_2']] = my_df[['column1','column2']].apply(...) to assign respectively to new columns.
19

If performance is a concern I would advice to use the following function to convert those columns to date_time:

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])

to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms

Source: https://github.com/sanand0/benchmarks/tree/master/date-parse

3 Comments

The numbers don't look that impressive if the data frame is deep and dates are not normally distributed. I ran the code on my data frame (273771,), the to_datetime time was 1min13 sec vs the lookup was 59 sec.
@smishra Most likely this depends on your input data. If there are a lot of duplicates. SerialDev 's approach than is a fast lookup instead of a conversion.
This looks great. How are we passing the series object as s? df['Date'] = lookup(['Date'])?
12

If you rather want to convert at load time, you could do something like this

date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)

Comments

2

First you need to extract all the columns your interested in from data then you can use pandas applymap to apply to_datetime to each element in the extracted frame, I assume you know the index of the columns you want to extract, In the code below column names of the third to the sixteenth columns are extracted. you can alternatively define a list and add the names of the columns to it and use that in place, you may also need to pass the date/time format of the the DateTime entries

import pandas as pd

cols_2_extract = data.columns[2:15]

data[cols_2_extract] = data[cols_2_extract].applymap(lambda x : pd.to_datetime(x, format = '%d %M %Y'))

Comments

1

Slightly different from the accepted answer, loc also works:

dx.loc[:,['birth_date','death_date']] = dx.loc[:,['birth_date','death_date']].apply(pd.to_datetime, errors='coerce')

Comments

0
data.iloc[:, 7:12]=data.iloc[:, 7:12].astype('datetime64[ns]')

Comments

0

read_csv()

Adding to @smishra answer. When importing .csv you can infer dates using infer-datetime-format as discussed here. This can only be used if the series has a consistent date format but will speedup the import of dates.

read_excel()

There is also the read_excel() function that can be used to import and process dates. You can pass the parse_dates parameter a list of columns name or numbers.

parse_dates = [7,8,9,10,11]
data = pd.read_excel('file_to_read.csv', sheet_name='Sheet1', parse_dates=parse_dates)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.