Combine date and time into a single datetime index
df = pd.read_csv(io.BytesIO(text), parse_dates = [[0,1]], header = None, index_col = 0)
print(df)
# 2
# 0_1
# 2012-01-05 08:00:00 1
# 2012-01-05 08:00:30 2
# 2012-01-05 08:01:00 3
# 2012-01-05 08:01:30 4
# 2012-01-05 08:02:00 5
Use date as the primary index and time as the secondary index in a
multiindex dataframe
df2 = pd.read_csv(io.BytesIO(text), parse_dates = True, header = None, index_col = [0,1])
print(df2)
# 2
# 0 1
# 2012-01-05 80000 1
# 80030 2
# 80100 3
# 80130 4
# 80200 5
My naive inclination would be to prefer a single index over the multiindex.
- As the Zen of Python asserts, "Flat is better than nested".
- The datetime is one conceptual object. Treat it as such. (It is better to have one datetime object than multiple columns for the year, month, day, hour, minute, etc. Similarly, it is better to have one index rather than two.)
However, I am not very experienced with Pandas, and there could be some advantage to having the multiindex when doing time-of-day analysis.
I would try coding up some typical calculations both ways, and then see which one I liked better on the basis of ease of coding, readability, and performance.
This was my setup to produce the results above.
import io
import pandas as pd
text = '''\
20120105, 080000, 1
20120105, 080030, 2
20120105, 080100, 3
20120105, 080130, 4
20120105, 080200, 5'''
You can of course use
pd.read_csv(filename, ...)
instead of
pd.read_csv(io.BytesIO(text), ...)
at_timeandbetween_timemethods once you've created a proper DatetimeIndex.