I just start with pandas and I would like to know how to count the number of document(unique) per year per company
My data are : df
year document_id company
0 1999 3 Orange
1 1999 5 Orange
2 1999 3 Orange
3 2001 41 Banana
4 2001 21 Strawberry
5 2001 18 Strawberry
6 2002 44 Orange
At the end, I would like to have a new dataframe like this
year document_id company nbDocument
0 1999 [3,5] Orange 2
1 2001 [21] Banana 1
2 2001 [21,18] Strawberry 2
3 2002 [44] Orange 1
I tried :
count2 = apyData.groupby(['year','company']).agg({'document_id': pd.Series.value_counts})
But with groupby operation, I'm not able to have this kind of structure and count unique value for Orange in 1999 for example, is there a way to do this ?
Thx
document_idofBananabe[41]?