How to sort datetime index dataframe

Question

I have a datetime index dataframe which contains data for every hour between 2019 and 2020 and which I import from a CSV file as follow in order to keep only the columns I want, with easier names (names are changed for work reasons):

file = 'data.csv'
df = pd.read_csv(file,sep=";", header=0, na_values=['NA', ' ' , '.'])
df['datetime']=pd.to_datetime(df['datetime'])
df['week'] = df['datetime'].dt.isocalendar().week
df['month'] = df['datetime'].dt.month
df['hour']=df['datetime'].dt.hour
df['day']=df['datetime'].dt.day

df=df.set_index(['datetime'])
   

df=df.rename(columns={'data1':'d1','data2':'d2','data3':'d3','data4':'d4','data5':'d5','data6':'d6','data7':'d7','data8':'d8','data9':'d9','data10':'d10','data11':'d11','data12':'d12','data13':'d13','data14':'d14','data15':'d15','data16':'d16'})

df=df[['d1','d2','d3','d4','d5','d6','d7','d8','d9','d10','d11','d12','d13','d14','d15','d16','week','month','hour','day']]

When I'm typing :

df['d4'][0:2800].min()

The answer is 995 which I know is the good answer cause I checked on the CSV file.

Now my problem is that during importation, some dates are put in the dataframe in wrong orders. I don't know why but for example 2019-09-09 will be followed by 2019-09-13 instead of 2019-09-10 .

I tried to fix it by using

df=df.sort_index(ascending=True)

or

df=df.sort_index()

and it seems to work as now all the dates are in the good order, but now that I type

df['d4'][0:2800].min()

the answer is now 870 which is a wrong value.

It seems like df.sort_index() is mixing my data, am I doing anything wrong?

what is the input format of df['datetime'], before you call pd.to_datetime(df['datetime'])? — FObersteiner
– FObersteiner, Commented Apr 29, 2021 at 15:04
Ok that's important; can you change your code to df['datetime']=pd.to_datetime(df['datetime'], dayfirst=True) and see if that helps? — FObersteiner
– FObersteiner, Commented Apr 30, 2021 at 4:52
It works thanks a lot! Can't believe I did'nt think about that... Thank you! — SonnePer
– SonnePer, Commented Apr 30, 2021 at 7:22

FObersteiner · Accepted Answer · 2021-04-30 14:07:17Z

The point here is to make sure date/time is imported correctly to datetime datatype. A string like '01/01/2019 00:00' will be parsed by default as mm/dd/YYYY HH:MM, see pandas.to_datetime:

dayfirst bool, default False

Depending on where you live, you might expect that the day comes first.

Also note that this is evaluated for each element; e.g. '25-12-2019' is parsed to Dec 25th since there is no month 25. But '03-12-2019' from the same column becomes March 12th although Dec 3rd might be expected. That can create quite a mess. So if...

day comes first in your date string:

df['datetime'] = pd.to_datetime(df['datetime'], dayfirst=True)

You could also

provide the parsing directive explicitly via the format kwarg of to_datetime
specify a column to be parsed to datetime via the parse_dates kwarg of pandas.read_csv. There, you can also specify dayfirst=True

Collectives™ on Stack Overflow

How to sort datetime index dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related