0

I am trying to read a csv file and cast one of the columns as datetime. However, I do not know why the some data points i.e. 2019-01-03 12:00:00 aremissing the milliseconds, while the rest of the data contains milliseconds. This causes an error.

My question is two-fold:

  1. Since current code below generates an error, how do I get around this and parse the datetime column ?
  2. If I were to reproduce this csv file, how can I ensure all datetimes data have milliseconds ?

Sorry. Not sure why the code is not displaying properly here.

custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)


    endTime
0   2019-01-02 09:40:22.668
1   2019-01-02 09:48:09.040
2   2019-01-02 09:54:54.209
3   2019-01-02 09:59:28.768
4   2019-01-02 10:06:33.820
5   2019-01-02 10:17:38.818
6   2019-01-02 10:30:26.999
7   2019-01-02 10:43:54.516
8   2019-01-02 11:04:26.652
9   2019-01-02 11:30:22.316
10  2019-01-02 11:59:59.751
11  2019-01-03 09:37:11.223
12  2019-01-03 09:49:06.226
13  2019-01-03 10:01:58.397
14  2019-01-03 10:15:20.918
15  2019-01-03 10:31:28.438
16  2019-01-03 10:52:26.130
17  2019-01-03 11:07:09.128
18  2019-01-03 11:22:00.907
19  2019-01-03 11:45:55.349
20  2019-01-03 12:00:00
21  2019-01-04 09:39:48.753
22  2019-01-04 09:48:06.856
23  2019-01-04 09:58:44.608
24  2019-01-04 10:10:49.498
25  2019-01-04 10:26:29.543
26  2019-01-04 10:39:36.750
27  2019-01-04 10:49:59.504
28  2019-01-04 11:00:02.138
29  2019-01-04 11:11:20.630
30  2019-01-04 11:27:59.402
31  2019-01-04 11:52:12.061
32  2019-01-04 11:59:59.879
33  2019-01-07 09:36:06.436
34  2019-01-07 09:44:07.126
35  2019-01-07 09:54:28.718
36  2019-01-07 10:05:54.130
37  2019-01-07 10:19:45.046
38  2019-01-07 10:38:15.991
39  2019-01-07 11:01:45.755
40  2019-01-07 11:17:39.586
41  2019-01-07 11:45:39.668
42  2019-01-07 12:00:00

The error msg is below:

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3298, in converter
    date_parser(*date_cols), errors="ignore", cache=cache_dates

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

TypeError: strptime() argument 1 must be str, not numpy.ndarray


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3309, in converter
    dayfirst=dayfirst,

  File "pandas\_libs\tslibs\parsing.pyx", line 589, in pandas._libs.tslibs.parsing.try_parse_dates

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime

ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-2-9b9600d4b508>", line 1, in <module>
    df_bars = pd.read_csv(f'C:\\Users\\someone\\Desktop\\CV\\2021\\data\\abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 468, in _read
    return parser.read(nrows)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 2113, in read
    names, data = self._do_date_conversions(names, data)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1846, in _do_date_conversions
    keep_date_col=self.keep_date_col,

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3352, in _process_date_conversion
    data_dict[colspec] = converter(data_dict[colspec])

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3314, in converter
    return generic_parser(date_parser, *date_cols)

  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\date_converters.py", line 100, in generic_parser
    results[i] = parse_func(*args)

  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime

  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime

ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'
3
  • Looks like it is not just the last one--row 20 also has no milliseconds. Commented Aug 7, 2021 at 6:58
  • yes you are totally right. Updated the wordings Commented Aug 7, 2021 at 6:59
  • Please provide a minimal and executable example. For more information see here. (Here it will probably not need anymore, but it is always more convenient for the person who answers) Commented Aug 7, 2021 at 7:03

3 Answers 3

3

you can try:

def custom_date_parser(x):
    return pd.to_datetime(x,format='%Y-%m-%d %H:%M:%S.%f',errors='coerce')

#Finally:
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)

OR

Don't use date_parser at all and let pandas to manupulate the format:

df = pd.read_csv('abc.csv',parse_dates=['endTime'])

Note: PEP 8 recommends not to use a named lambda.

You can get a detailed explanation at: Is it pythonic: naming lambdas

Sign up to request clarification or add additional context in comments.

8 Comments

Named lambda is not a recommended way, you can directly pass the lambda function to date_parser
@ThePyGuy sir for simplicity I assigned lambda to a variable
Then you can use normal python function with def
PEP 8 recommends not to use a named lambda. You can get a detailed explanation at Is it pythonic: naming lambdas
This looks better
|
0

If pd.to_datatime does not help you, you could also filter the row for each format and convert them individually. See this answer for reference.

Comments

0

By default it adds .000 , What is the exact error you are seeing .

import pandas as pd
df = pd.DataFrame({'date': ['2016-6-10 09:40:22.668', 
                            '2016-7-1 19:45:30.532', 
                            '2013-10-12 4:5:1'],
                   'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%m-%d %H:%M:%S.%f")
print(df)

o/p

         date                value
0 2016-06-10 09:40:22.668      2
1 2016-07-01 19:45:30.532      3
2 2013-10-12 04:05:01.000      4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.