I have my data as below. I need to combine this date and hour into a single column and I need to plot a line plot for year 2015 and its corresponding hours. enter image description here
2 Answers
Try using the following to combine your MultiIndex into a single DatetimeIndex:
df.set_index(pd.to_datetime(df.index.get_level_values(0) ) +
pd.to_timedelta(df.index.get_level_values(1), unit='H'),
inplace=True)
From the data you provided, there appears to be gaps, for example there is no 'msg_count' value at 2015-01-01 09:00.
To fix this you can DataFrame.reindex with pandas.date_range and fill missing entries with 0
new_idx = pd.date_range(df.index.min(), df.index.max(), freq='H')
df.reindex(new_idx, fill_value=0, inplace=True)
To plot 2015 data only use:
df[df.index.year == 2015].plot()
Comments
You could create a new column that contains the date in datetime format, and then use the matplotlib.pyplot.plot_date function to plot it.
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'date': ['2018-12-12', '2018-12-12', '2018-12-13'], 'hour': [22, 23, 0], 'msg_count': [10, 20, 30]})
df['datetime'] = df.apply(
lambda x: dt.datetime.strptime(x['date'], '%Y-%m-%d')
+ dt.timedelta(hours=x['hour']),
axis=1)
plt.plot_date(df['datetime'], df['msg_count'])
plt.show()
2 Comments
Malathy
Hi thank you for the code snippet. I tried the same, but getting the below error. "AttributeError: ("'DataFrame' object has no attribute 'datetime'", 'occurred at index (2014-12-31, 23)')"
Felipe Gonzalez
It looks like the DataFrame you have was grouped by
date and hour, am I correct? Then the problem you have is that date and hour are part of the index. You should do a df = df.reset_index() before running the apply part and the plot part.