0

I would like to group rows by time and I tried the following approach

import pandas as pd

df = pd.DataFrame({'time': ["2001-01-01 10:20:30,000", 
                            "2001-01-01 10:20:31,000",
                            "2001-01-02 5:00:00,000"],
                    'val': [1, 2, 3]})

t = pd.DatetimeIndex(df.time)
df = df.groupby([t.day, t.hour, t.minute]).count()

The resulting dataframe is

                   time val
    time time time      
       1   10   20    2   2
       2    5    0    1   1

The output I expect (or something similar):

           time   count             
     1  1-10-20       2
     2    2-5-0       1

The plot I want: X-axis for minutes, Y-axis for count, ticks by day + hour (coarser than just minutes).

Questions:

1) Why the index consist of 3 time columns and how can I have the index with just a single column with elements like 1-10-20 and 2-5-0?

2) What is the best practice to have only one column with the results of count() instead of two columns time and val?

2) How can I plot this data (grouped by days/hours/minutes) with ticks in days and hours?

3
  • Given the example you provide what is the output you expect? Commented Sep 25, 2018 at 18:16
  • 1
    Can you clarify about the plot you want? The other two questions are easier Commented Sep 25, 2018 at 18:17
  • @user3483203 I updated the question. Commented Sep 25, 2018 at 18:25

2 Answers 2

1

To answer your first question, it's because you're grouping by three separate series. If you really want them combined, group by a strftime:

df.time = pd.to_datetime(df.time)

df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count()

time
01-10-20    2
02-05-00    1
Name: val, dtype: int64

The above also answers your second question. Instead of counting the DataFrame, count a single series, your val series.


Finally, to plot, you can use the builtin plot functionality of pandas. I am creating a more complex example to demonstrate the ticks you want:

r = pd.date_range(start='2001-01-01', freq='5T', periods=100)
df = pd.DataFrame({'time':r, 'val': np.random.randint(1, 10, 100)})

out = df.groupby([df.time.dt.strftime('%d-%H-%M')]).val.count().reset_index()

ax = out.assign(label=out.time.str[:5]).plot(x='label', y='val', kind='bar')

seen_ticks = set()

for idx, label in enumerate(ax.xaxis.get_ticklabels()):
    if label.get_text() in seen_ticks:
        label.set_visible(False)
    else:
        seen_ticks.add(label.get_text())
plt.tight_layout()
plt.show()

This will show only unique x-ticks for minute/hour

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much for the answer! Can you please clarify how I can have bins for day/hour/minute, but ticks only for day/hour (coarser) because there are quite a lot of bins.
I don't know what that means unfortunately. If you can show an example plot that demonstrates the concept I can update the answer
For each hour there are 60 dots and there are multiple hours. I want all the data points (per each minute) to be present on the graph but the ticks on X-axis should only be for days/hours, not minutes. So there are much less ticks than data points.
Something like this pandas.pydata.org/pandas-docs/stable/_images/… but with ticks for days/hours.
@Konstantin demonstrated a way to show less x-ticks
0

1) Use pandas.DataFrame.from_dict(data) to create a dataframe from a dictionary. (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html)

2) This question isn't entirely clear, but I think what you want is

df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)

and then apply your count() aggregation.

3) This question isn't clear to me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.