2

I have a crime dataset where every row is one recorded offence that is to be used in an ARIMA time series model.

    Date

0   2015-09-05
1   2015-09-05
2   2015-07-08
3   2017-09-05
4   2018-09-05
4   2018-09-05

I would like to group by data, so that offences that occurred on the same day are aggregated.

    Date          Count

0   2015-09-05    2
1   2015-07-08    1
2   2017-09-05    1
3   2018-09-05    2

I'm struggling because I'm trying to both group by weeks per year, and because I'm not aggregating the contents of a column, I'm trying to count how many rows are grouped into it.

Thank you.

1

2 Answers 2

1

If your dataset is a dataframe, you can use:

df.assign(Count=1).groupby('Date')['Count'].count()

If it's a series:

series.to_frame().assign(Count=1).groupby('Date')['Count'].count()

For example:

df = pd.DataFrame({'Date':['2015-09-05',
                           '2015-09-05',
                           '2015-07-08',
                           '2017-09-05',
                           '2018-09-05',
                           '2018-09-05']})
df.assign(Count=1).groupby('Date')['Count'].count().reset_index()

Returns:

         Date  Count
0  2015-07-08      1
1  2015-09-05      2
2  2017-09-05      1
3  2018-09-05      2
Sign up to request clarification or add additional context in comments.

2 Comments

This is great. Would there be any way to name the new count column with a 'Count' header?
Sure, add .reset_index() at the end! Or check my edit. If it helps you solve the issue, accepting the answer will mark it as solved and if you feel like it, upvoting it will also reward me with some score rep!
1

One way to do it is to use Python rather than pandas for the heavy lifting:

import pandas as pd
import datetime
df = pd.DataFrame([datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in ['2015-09-05', '2015-09-05', '2015-07-08', '2017-09-05', '2018-09-05', '2018-09-05']], columns=['Date'])

from collections import Counter
c = Counter(list(df['Date']))
df2 = pd.DataFrame(zip(list(c.keys()), list(c.values())), columns=['Date', 'Count'])
print(df2)

Output:

         Date  Count
0  2015-09-05      2
1  2015-07-08      1
2  2017-09-05      1
3  2018-09-05      2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.