Python sorting CSV by date and time

Question

Ok so I have a CSV file in the format:

   1 | Thu Oct 04 21:47:53 GMT+01:00 2018 | 35.3254
   2 | Sun Oct 07 09:32:11 GMT+01:00 2018 | 45.7824
   3 | Mon Oct 01 01:00:44 GMT+01:00 2018 | 94.1246

  ...

3023 | Sat Oct 23 01:00:44 GMT+01:00 2018 | 67.2007

I want to sort by date and time so I get something like:

...

456 | Oct 16 23:25:06 | 45.6547
457 | Oct 16 23:29:21 | 64.3453
458 | Oct 16 23:34:17 | 27.6841
459 | Oct 16 23:40:04 | 78.6547
460 | Oct 16 23:44:18 | 11.6547
461 | Oct 16 23:49:22 | 34.6547
462 | Oct 16 23:54:15 | 37.6547
463 | Oct 17 00:00:20 | 68.6547
464 | Oct 17 00:05:06 | 07.6547
465 | Oct 17 00:09:15 | 13.6547
466 | Oct 17 00:14:45 | 37.6547
467 | Oct 17 00:19:26 | 84.6547

...

The date and time is in a nasty format so I have tried the following:

df = pd.read_csv(file, header=None, engine='c', delimiter=',' )

for index, row in df.iterrows():
    result = sorted(df.iterrows(),key=lambda row: datetime.strptime((str(row[1]))[9:24], "%b %d %H:%M:%S"))

print (result)

(the [9:24] should allow me to splice the string to get just Oct 16 23:29:21 for example)

I am getting error:

ValueError: time data 'ame: 0, dtype: ' does not match format '%b %d %H:%M:%S'

I think my problem is that I am accessing the row properly but I cannot seem to access the date value on it's own (the 2nd element of the row), therefore the sort is not working.

Any idea would be much appreciated! thanks

Daniel Labbe · Accepted Answer · 2019-02-27 17:50:57Z

4

You can use the parameter infer_datetime_format. Example with your sample data below:

>> df['date'] = pd.to_datetime(df.date, infer_datetime_format = True)
>> df.sort_values(by = 'date', ascending = True, inplace = True)
>> df.date
2   2018-10-01 02:00:44
0   2018-10-04 22:47:53
1   2018-10-07 10:32:11
3   2018-10-23 02:00:44
Name: date, dtype: datetime64[ns]

From pandas.to_datetime() documentation:

infer_datetime_format : boolean, default False

If True and no format is given, attempt to infer the format of the datetime strings, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.

edited Feb 27, 2019 at 17:50

answered Feb 27, 2019 at 17:38

Daniel Labbe

2,0193 gold badges20 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rakesh · Accepted Answer · 2019-02-27 17:35:25Z

3

You can use parse_dates while reading the csv to convert to datetime object.

Ex:

import pandas as pd

df = pd.read_csv(filename, names=["Date", "Col"], sep="|", parse_dates=["Date"])
df.sort_values(["Date"], inplace=True)
print(df)

answered Feb 27, 2019 at 17:35

Rakesh

82.9k17 gold badges86 silver badges122 bronze badges

Comments

spattanaik75 · Accepted Answer · 2019-02-27 17:35:34Z

1

Try this date parser:

from dateutil.parser import parse
print(parse(timestr=('Thu Oct 04 21:47:53 GMT+01:00 2018'), dayfirst=False,fuzzy_with_tokens=True)[0])

answered Feb 27, 2019 at 17:35

spattanaik75

1431 silver badge9 bronze badges

Comments

Frenchy · Accepted Answer · 2019-02-27 17:57:01Z

1

use strftime before sorting the data

import pandas as pd

df = pd.DataFrame({'Date': ['Thu Oct 04 21:47:53 GMT+01:00 2018','Sun Oct 07 09:32:11 GMT+01:00 2018']})
df['Clean_Date'] = df.Date.apply(lambda x: pd.to_datetime(x).strftime('%b %d %H:%M:%S'))

print(df)
                             Date       Clean_Date
0  Thu Oct 04 21:47:53 GMT+01:00 2018  Oct 04 21:47:53
1  Sun Oct 07 09:32:11 GMT+01:00 2018  Oct 07 09:32:11

edited Feb 27, 2019 at 17:57

answered Feb 27, 2019 at 17:35

Frenchy

17.1k3 gold badges21 silver badges42 bronze badges

Collectives™ on Stack Overflow

Python sorting CSV by date and time

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related