Combine date column and time column into index in pandas data frame

Question

I have an intraday 30-second interval time series data in a CSV file with the following format:

20120105, 080000,   1
20120105, 080030,   2
20120105, 080100,   3
20120105, 080130,   4
20120105, 080200,   5

How can I read it into a pandas data frame with these two different indexing schemes:

1, Combine date and time into a single datetime index

2, Use date as the primary index and time as the secondary index in a multiindex dataframe

What are the pros and cons of these two schemes? Is one generally more preferable than the other? In my case, I would like to look at time-of-the-day analysis but am not entirely sure which scheme will be more convenient for my purpose. Thanks in advance.

For time of day analysis you should be well covered using the at_time and between_time methods once you've created a proper DatetimeIndex. — Wes McKinney
– Wes McKinney, Commented Jan 13, 2013 at 0:14

unutbu · Accepted Answer · 2013-01-13 01:47:40Z

Combine date and time into a single datetime index

df = pd.read_csv(io.BytesIO(text), parse_dates = [[0,1]], header = None, index_col = 0)
print(df)
#                      2
# 0_1                   
# 2012-01-05 08:00:00  1
# 2012-01-05 08:00:30  2
# 2012-01-05 08:01:00  3
# 2012-01-05 08:01:30  4
# 2012-01-05 08:02:00  5

Use date as the primary index and time as the secondary index in a multiindex dataframe

df2 = pd.read_csv(io.BytesIO(text), parse_dates = True, header = None, index_col = [0,1])
print(df2)
#                   2
# 0          1       
# 2012-01-05 80000  1
#            80030  2
#            80100  3
#            80130  4
#            80200  5

My naive inclination would be to prefer a single index over the multiindex.

As the Zen of Python asserts, "Flat is better than nested".
The datetime is one conceptual object. Treat it as such. (It is better to have one datetime object than multiple columns for the year, month, day, hour, minute, etc. Similarly, it is better to have one index rather than two.)

However, I am not very experienced with Pandas, and there could be some advantage to having the multiindex when doing time-of-day analysis.

I would try coding up some typical calculations both ways, and then see which one I liked better on the basis of ease of coding, readability, and performance.

This was my setup to produce the results above.

import io
import pandas as pd

text = '''\
20120105, 080000,   1
20120105, 080030,   2
20120105, 080100,   3
20120105, 080130,   4
20120105, 080200,   5'''

You can of course use

pd.read_csv(filename, ...)

instead of

pd.read_csv(io.BytesIO(text), ...)

Collectives™ on Stack Overflow

Combine date column and time column into index in pandas data frame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related