I have this dataframe
df = pd.DataFrame({
"col1": ["Kev", "Kev", "Fr"],
"col2": ["Red; Purple", "Yellow; Purple; Red", "Red; Yellow"], }, index=["1", "2", "3"])
It'll look like this
col1 col2
1 Kev Red; Purple
2 Kev Yellow; Purple; Red
3 Fr Red; Yellow
I want to count all the items in col2 according to col1. In this case the final df will be like this:
col1 col2 count
1 Kev Red 2
2 Kev Purple 2
3 Kev Yellow 1
4 Fr Red 1
5 Fr Yellow 1
I tried using explode:
df2 = (df.explode(df.columns.tolist())
.apply(lambda col: col.str.split(';'))
.explode('col1')
.explode('col2'))
but that only gives me col1 and col2 of my desired dataframe, not the count. If I use crosstab on df2, I'll get a very different result.
I managed to get the desired output with 2 nested for loops, but my dataframe is so big that it takes almost a minute loading the function. I want to avoid this solution.