1

Ok so I have a CSV file in the format:

   1 | Thu Oct 04 21:47:53 GMT+01:00 2018 | 35.3254
   2 | Sun Oct 07 09:32:11 GMT+01:00 2018 | 45.7824
   3 | Mon Oct 01 01:00:44 GMT+01:00 2018 | 94.1246

  ...

3023 | Sat Oct 23 01:00:44 GMT+01:00 2018 | 67.2007

I want to sort by date and time so I get something like:

...

456 | Oct 16 23:25:06 | 45.6547
457 | Oct 16 23:29:21 | 64.3453
458 | Oct 16 23:34:17 | 27.6841
459 | Oct 16 23:40:04 | 78.6547
460 | Oct 16 23:44:18 | 11.6547
461 | Oct 16 23:49:22 | 34.6547
462 | Oct 16 23:54:15 | 37.6547
463 | Oct 17 00:00:20 | 68.6547
464 | Oct 17 00:05:06 | 07.6547
465 | Oct 17 00:09:15 | 13.6547
466 | Oct 17 00:14:45 | 37.6547
467 | Oct 17 00:19:26 | 84.6547

...

The date and time is in a nasty format so I have tried the following:

df = pd.read_csv(file, header=None, engine='c', delimiter=',' )

for index, row in df.iterrows():
    result = sorted(df.iterrows(),key=lambda row: datetime.strptime((str(row[1]))[9:24], "%b %d %H:%M:%S"))

print (result)

(the [9:24] should allow me to splice the string to get just Oct 16 23:29:21 for example)

I am getting error:

ValueError: time data 'ame: 0, dtype: ' does not match format '%b %d %H:%M:%S'

I think my problem is that I am accessing the row properly but I cannot seem to access the date value on it's own (the 2nd element of the row), therefore the sort is not working.

Any idea would be much appreciated! thanks

4 Answers 4

4

You can use the parameter infer_datetime_format. Example with your sample data below:

>> df['date'] = pd.to_datetime(df.date, infer_datetime_format = True)
>> df.sort_values(by = 'date', ascending = True, inplace = True)
>> df.date
2   2018-10-01 02:00:44
0   2018-10-04 22:47:53
1   2018-10-07 10:32:11
3   2018-10-23 02:00:44
Name: date, dtype: datetime64[ns]

From pandas.to_datetime() documentation:

infer_datetime_format : boolean, default False

If True and no format is given, attempt to infer the format of the datetime strings, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.

Sign up to request clarification or add additional context in comments.

Comments

3

You can use parse_dates while reading the csv to convert to datetime object.

Ex:

import pandas as pd

df = pd.read_csv(filename, names=["Date", "Col"], sep="|", parse_dates=["Date"])
df.sort_values(["Date"], inplace=True)
print(df)

Comments

1

Try this date parser:

from dateutil.parser import parse
print(parse(timestr=('Thu Oct 04 21:47:53 GMT+01:00 2018'), dayfirst=False,fuzzy_with_tokens=True)[0])

Comments

1

use strftime before sorting the data

import pandas as pd

df = pd.DataFrame({'Date': ['Thu Oct 04 21:47:53 GMT+01:00 2018','Sun Oct 07 09:32:11 GMT+01:00 2018']})
df['Clean_Date'] = df.Date.apply(lambda x: pd.to_datetime(x).strftime('%b %d %H:%M:%S'))

print(df)
                             Date       Clean_Date
0  Thu Oct 04 21:47:53 GMT+01:00 2018  Oct 04 21:47:53
1  Sun Oct 07 09:32:11 GMT+01:00 2018  Oct 07 09:32:11

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.