4
    yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]

    yearGroups = yearCount.groupby('order_date')

    for year in yearGroups:
        yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)

In this case, yearCount is a dataframe containing 'order_date', 'antibiotic', 'antiYearCount'. I have cleaned 'order_date' to only contain the year of the order. I want to group yearCount by the years in 'order_date', count the number of times each 'antibiotic' appears in each "year group" then assign that value to yearCount's 'antiYearCount' variable.

1 Answer 1

5

I think you need add new column order_date to groupby and then is also possible usesize instead pd.Series.value_counts for same output:

df = pd.DataFrame({'antibiotic':list('accbbb'),
                   'antiYearCount':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})

print (df)
   C  D  E  antiYearCount antibiotic order_date
0  7  1  5              4          a 2012-01-01
1  8  3  3              5          c 2012-01-01
2  9  5  6              4          c 2012-01-01
3  4  7  9              5          b 2012-01-02
4  2  1  2              5          b 2012-01-02
5  3  0  4              4          b 2012-01-02

#copy for remove warning
#https://stackoverflow.com/a/45035966/2901002
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform('size')
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3

yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform(pd.Series.value_counts)
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3
Sign up to request clarification or add additional context in comments.

3 Comments

jezrael has a great answer, I was able to fix it in the meantime. groupby returns a tuple, in this case year[0] is the year being iterated on and year[1] is the df where values of order_date are year[0]. By doing the same groupby function on year[1]
Hmmm, I think loop is not necessary, so your solution can be changed to my sol?
Right! It was running in under a second on 1.7M rows so I didnt consider. Thanks so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.