1

I'm trying to parse a csv file into a dataFrame as I need to do some analysis on the timestamps. the csv file is well structured, and I can read it without a problem by using pd.read_csv:

import pandas as pd
import datetime as dt

df = pd.read_csv('trip_data.csv', low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)

However, even when giving parse_dates and infer_datetime_format as arguments, I still end up with a dataFrame that doesn't parse the timestamps on my file:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8771828 entries, 0 to 8771827
Data columns (total 3 columns):
UserID                   int64
datetime                 object
amount                   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.1+ GB

So when I try to get the minimum date, e.g.:

print(df['datetime'].min())

I get an incorrect answer, as I can see that the minimum timestamp on my df is 2018-01-01 00:08:26 and I get 2018-01-27 04:06:37 as minimum... am I missing anything, or is there any way to cast this to datetime64 in another way?

Here's a peak of my csv file:

UserID,datetime,amount
1,2018-01-01 00:21:05,5.8
1,2018-01-01 00:44:55,15.3
1,2018-01-01 00:08:26,8.3
1,2018-01-01 00:20:22,34.8
1,2018-01-01 00:09:18,16.55
1,2018-01-01 00:29:29,5.8
1,2018-01-01 00:38:08,12.35
1,2018-01-01 00:49:29,6.3
3
  • 1
    your code works for me. what versions of python and pandas are you using? you can use sys.version and pd.__version__ to find this out. Commented Jul 11, 2019 at 17:19
  • I'm running on google colab. Sys version : '3.6.8 (default, Jan 14 2019, 11:02:34) and pd.__version__: '0.24.2' Commented Jul 11, 2019 at 17:58
  • In that case all I can think of is trying it on your local machine. Commented Jul 11, 2019 at 18:13

2 Answers 2

2

You can convert your column to datetime manually

df['datetime'] = pd.to_datetime(df['datetime'])

and then

print(df['datetime'].min())
Sign up to request clarification or add additional context in comments.

1 Comment

It seems that there was one invalid date that was silently not turning my datetime column into the datetime format, but when I converted manually I got the error. After calling errors='coerce' it fixed the issue. Thanks!
0

Without a peek at your data source, it's hard to give advice on how to fix this, but a good place to look might be the documentation on parsing datetime from CSV here.

A first step might be to try: (parse_dates=True, infer_datetime_format=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.