0

I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)

if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.

At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.

I've tried multiple methods but have run into a problem every way I've tried. Any help would be appreciated. If I need to give any more information then I will.

1
  • Have you tried pd.to_datetime()? Commented Dec 11, 2016 at 17:20

3 Answers 3

3

As suggested by @MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:

df['datetime'] = pd.to_datetime(df.datetime)

You would have to do this on whatever columns you have that you need trasformed to the right dtype.

Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:

df = pd.read_csv(path, parse_dates=[1,2,3])

where columns 1,2,3 are expected to contain data that can be interpreted as dates.

I hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

2

convert a column to datetime using this approach

df["Date"] = pd.to_datetime(df["Date"])

If column has empty values then change error level to coerce to ignore errors: Details

df["Date"] = pd.to_datetime(df["Date"], errors='coerce')

After which you should be able to subtract two dates.

example:

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

consult this answer for more details:

Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

If you want to directly load the column as datetime object while reading from csv, consider this example :

Pandas read csv dateint columns to datetime

2 Comments

This is the error I get local variable 'mask' referenced before assignment Curiously, I think it is to do with the fact that I have missing values. I've found that the line works on some columns but not on ones that have NaN values.
glad i could help :) @M.Throw
1

I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.