I have a dataframe of English words with French translations, here is a simplified version:
| English | French |
|---|---|
| to provide | fournir |
| to provide | fournir |
| to provide | offrir, fournir |
| to provide | offrir |
| to provide | fournir |
| to provide | offrir |
I would like to create a pivot table which counts the occurences of each translation, like so:
| English | French translation | Number of times |
|---|---|---|
| to provide | fournir | 3 |
| to provide | offrir, fournir | 1 |
| to provide | offrir | 2 |
I have tried using the code below, but (1) it doesn't retain the French translations, and (2) there is only one row for each English word, whereas I would expect several rows, one for each given French translation.
df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index='english', values='french',aggfunc=pd.Series.nunique)
| English | French |
|---|---|
| to provide | 712 |
| to argue | 521 |
In this example from my real data, there is only ONE row for "to provide", whereas I want several, with the count for each different French translation. I have also tried the following code:
df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index=['english','french'], values='french', aggfunc=pd.Series.nunique)
But I get the following error:
ValueError: Grouper for 'french' not 1-dimensional
As a bonus, I also need to expand the French translations where there are two (or more) words and add them to the count, like so, but I think this is stage two:
| English | French translation | Number of times |
|---|---|---|
| to provide | fournir | 4 |
| to provide | offrir | 3 |
Thank you for your help.