Pandas DataFrame and DateTimeIndex

Question

I would like to group rows by time and I tried the following approach

import pandas as pd

df = pd.DataFrame({'time': ["2001-01-01 10:20:30,000", 
                            "2001-01-01 10:20:31,000",
                            "2001-01-02 5:00:00,000"],
                    'val': [1, 2, 3]})

t = pd.DatetimeIndex(df.time)
df = df.groupby([t.day, t.hour, t.minute]).count()

The resulting dataframe is

                   time val
    time time time      
       1   10   20    2   2
       2    5    0    1   1

The output I expect (or something similar):

           time   count             
     1  1-10-20       2
     2    2-5-0       1

The plot I want: X-axis for minutes, Y-axis for count, ticks by day + hour (coarser than just minutes).

Questions:

1) Why the index consist of 3 time columns and how can I have the index with just a single column with elements like 1-10-20 and 2-5-0?

2) What is the best practice to have only one column with the results of count() instead of two columns time and val?

2) How can I plot this data (grouped by days/hours/minutes) with ticks in days and hours?

Given the example you provide what is the output you expect? — vielkind
– vielkind, Commented Sep 25, 2018 at 18:16
Can you clarify about the plot you want? The other two questions are easier — user3483203
– user3483203, Commented Sep 25, 2018 at 18:17

user3483203 · Accepted Answer · 2018-09-25 19:03:47Z

1

To answer your first question, it's because you're grouping by three separate series. If you really want them combined, group by a strftime:

df.time = pd.to_datetime(df.time)

df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count()

time
01-10-20    2
02-05-00    1
Name: val, dtype: int64

The above also answers your second question. Instead of counting the DataFrame, count a single series, your val series.

Finally, to plot, you can use the builtin plot functionality of pandas. I am creating a more complex example to demonstrate the ticks you want:

r = pd.date_range(start='2001-01-01', freq='5T', periods=100)
df = pd.DataFrame({'time':r, 'val': np.random.randint(1, 10, 100)})

out = df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count().reset_index()

ax = out.assign(label=out.time.str[:5]).plot(x='label', y='val', kind='bar')

seen_ticks = set()

for idx, label in enumerate(ax.xaxis.get_ticklabels()):
    if label.get_text() in seen_ticks:
        label.set_visible(False)
    else:
        seen_ticks.add(label.get_text())
plt.tight_layout()
plt.show()

This will show only unique x-ticks for minute/hour

edited Sep 25, 2018 at 19:03

answered Sep 25, 2018 at 18:32

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Konstantin Over a year ago

Thank you very much for the answer! Can you please clarify how I can have bins for day/hour/minute, but ticks only for day/hour (coarser) because there are quite a lot of bins.

user3483203 Over a year ago

I don't know what that means unfortunately. If you can show an example plot that demonstrates the concept I can update the answer

Konstantin Over a year ago

For each hour there are 60 dots and there are multiple hours. I want all the data points (per each minute) to be present on the graph but the ticks on X-axis should only be for days/hours, not minutes. So there are much less ticks than data points.

Konstantin Over a year ago

Something like this pandas.pydata.org/pandas-docs/stable/_images/… but with ticks for days/hours.

user3483203 Over a year ago

@Konstantin demonstrated a way to show less x-ticks

Joon-Ho Son · Accepted Answer · 2018-09-25 18:28:49Z

0

1) Use pandas.DataFrame.from_dict(data) to create a dataframe from a dictionary. (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html)

2) This question isn't entirely clear, but I think what you want is

df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)

and then apply your count() aggregation.

3) This question isn't clear to me.

answered Sep 25, 2018 at 18:28

Joon-Ho Son

1051 silver badge8 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame and DateTimeIndex

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related