1

recently I faced a really weird csv file with 2 columns (with headers), one for dates and the second one for prices. The time format was "dd.mm.yyyy".

d = {'Date': [31.12.1991, 02.01.1992, 03.01.1992, 06.01.1992], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
prices = pd.DataFrame(df['Prices'])
date = pd.DataFrame(df['Date'])
date = date.to_string(header=True)
date = df.to_datetime(utc=True, infer_datetime_format=True)
frame = date.join(values)
print(df)

I tried to make it work by isolating the date column and trying to transform it first into string with the to_string() function and then back to date with the to_datetime but it was no use.

Any suggestions? Thanks in advance

3 Answers 3

5

Interesting way to generalize for whole dataframe

Note This uses errors='ignore' in order to skip columns that might not be suitable for parsing as dates. However, the trade off is that if there is a column that is intended to be parsed as dates but has a bad date value, this approach will leave that column unaltered. The point is to make sure you don't have bad date values.

df.assign(
    **df.select_dtypes(exclude=[np.number]).apply(
        pd.to_datetime, errors='ignore', dayfirst=True
    )
)

        Date  Prices
0 1991-12-31    9.62
1 1992-01-02    9.50
2 1992-01-03    9.73
3 1992-01-06    9.45

Another example

df = pd.DataFrame(dict(
    A=1, B='B', C='6.7.2018', D=1-1j,
    E='1.2.2017', F=pd.Timestamp('2016-08-08')
), [0])

df

   A  B         C       D         E          F
0  1  B  6.7.2018  (1-1j)  1.2.2017 2016-08-08

df.assign(
    **df.select_dtypes(exclude=[np.number]).apply(
        pd.to_datetime, errors='ignore', dayfirst=True
    )
)

   A  B          C       D          E          F
0  1  B 2018-07-06  (1-1j) 2017-02-01 2016-08-08

Setup

borrowed from jezrael

d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
Sign up to request clarification or add additional context in comments.

1 Comment

Hmmm, errors='ignore' is not good idea if bad datetimes, maybe you can pointed need all valid dates.
3

You could try to parse the dates when you read in the file. You can specify that the format has the day first instead of the month.

import pandas as pd
df = pd.read_csv('test.csv', parse_dates=['Date'], dayfirst=True)

print(df)
#        Date   Prices
#0 1991-12-31     9.62
#1 1992-01-02     9.50
#2 1992-01-03     9.73
#3 1992-01-06     9.45

df.dtypes
#Date      datetime64[ns]
#Prices           float64
#dtype: object

However, your data really need to be clean and properly formatted for this to work:

parse_dates:

  • If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv

Sample Data: test.csv

Date,Prices
31.12.1991,9.62
02.01.1992,9.5
03.01.1992,9.73
06.01.1992,9.45

1 Comment

Thanks a lot! I didn't consider that I could fix the issue by going back to the CSV =)
3

I believe need:

d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
        Date  Prices
0 1991-12-31    9.62
1 1992-01-02    9.50
2 1992-01-03    9.73
3 1992-01-06    9.45

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.