3

I have a datetime index dataframe which contains data for every hour between 2019 and 2020 and which I import from a CSV file as follow in order to keep only the columns I want, with easier names (names are changed for work reasons):

file = 'data.csv'
df = pd.read_csv(file,sep=";", header=0, na_values=['NA', ' ' , '.'])
df['datetime']=pd.to_datetime(df['datetime'])
df['week'] = df['datetime'].dt.isocalendar().week
df['month'] = df['datetime'].dt.month
df['hour']=df['datetime'].dt.hour
df['day']=df['datetime'].dt.day

df=df.set_index(['datetime'])
   

df=df.rename(columns={'data1':'d1','data2':'d2','data3':'d3','data4':'d4','data5':'d5','data6':'d6','data7':'d7','data8':'d8','data9':'d9','data10':'d10','data11':'d11','data12':'d12','data13':'d13','data14':'d14','data15':'d15','data16':'d16'})

df=df[['d1','d2','d3','d4','d5','d6','d7','d8','d9','d10','d11','d12','d13','d14','d15','d16','week','month','hour','day']]

When I'm typing :

df['d4'][0:2800].min()

The answer is 995 which I know is the good answer cause I checked on the CSV file.

Now my problem is that during importation, some dates are put in the dataframe in wrong orders. I don't know why but for example 2019-09-09 will be followed by 2019-09-13 instead of 2019-09-10 .

I tried to fix it by using

df=df.sort_index(ascending=True)

or

df=df.sort_index()

and it seems to work as now all the dates are in the good order, but now that I type

df['d4'][0:2800].min()

the answer is now 870 which is a wrong value.

It seems like df.sort_index() is mixing my data, am I doing anything wrong?

6
  • what is the input format of df['datetime'], before you call pd.to_datetime(df['datetime'])? Commented Apr 29, 2021 at 15:04
  • Hello, it's a string The form is '01/01/2019 00:00' Commented Apr 29, 2021 at 15:33
  • so, what comes first, day or month? Commented Apr 29, 2021 at 15:48
  • 1
    Ok that's important; can you change your code to df['datetime']=pd.to_datetime(df['datetime'], dayfirst=True) and see if that helps? Commented Apr 30, 2021 at 4:52
  • 1
    It works thanks a lot! Can't believe I did'nt think about that... Thank you! Commented Apr 30, 2021 at 7:22

1 Answer 1

1

The point here is to make sure date/time is imported correctly to datetime datatype. A string like '01/01/2019 00:00' will be parsed by default as mm/dd/YYYY HH:MM, see pandas.to_datetime:

dayfirst bool, default False

Depending on where you live, you might expect that the day comes first.

Also note that this is evaluated for each element; e.g. '25-12-2019' is parsed to Dec 25th since there is no month 25. But '03-12-2019' from the same column becomes March 12th although Dec 3rd might be expected. That can create quite a mess. So if...

day comes first in your date string:

df['datetime'] = pd.to_datetime(df['datetime'], dayfirst=True)

You could also

  • provide the parsing directive explicitly via the format kwarg of to_datetime
  • specify a column to be parsed to datetime via the parse_dates kwarg of pandas.read_csv. There, you can also specify dayfirst=True
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.