2

i have a pandas dataframe with columns that, themselves, contain np.array. Imagine having something like this:

import random
df = pd.DataFrame(data=[[[random.randint(1,7) for _ in range(10)] for _ in range(5)]], index=["col1"])
df = df.transpose()

which will result in a dataframe like this:

                              col1
0   [7, 7, 6, 7, 6, 5, 5, 1, 7, 4]
1   [4, 7, 5, 5, 6, 6, 5, 4, 7, 5]
2   [7, 2, 7, 7, 2, 7, 6, 7, 1, 2]
3   [5, 7, 1, 2, 6, 5, 4, 3, 5, 2]
4   [2, 3, 2, 6, 3, 3, 1, 1, 7, 7]

I want to expand the dataframe to a dataframe with columns ["col1",...."col7"] and count for each row the number of occurances.

The desired result should be an extended dataframe, containing integer values only.

    col1 col2 col3 col4 col5 col6 col7                         
0     1   0     0    1    2    2    4   
1     0   0     0    2    3    2    2
2     1   3     0    0    0    1    5 

My approach so far is pretty hard coded. I created col1,...col7 with 0 and after that I'm using iterrows() to count the occurances. This works well, but it's quite a lot of code and I'm sure there is a more elegant way to do this. Maybe something with .value_counts() for each array in a row?

Maybe someone can help me find it. Thanks

0

1 Answer 1

3
np.random.seed(2022)

from collections import Counter
import numpy as np

df = pd.DataFrame(data=[[[np.random.randint(1,7) for _ in range(10)] for _ in range(5)]], 
                  index=["col1"])
df = df.transpose()

You can use Series.explode with SeriesGroupBy.value_counts and reshape by Series.unstack:

df1 = (df['col1'].explode()
                 .groupby(level=0)
                 .value_counts()
                 .unstack(fill_value=0)
                 .add_prefix('col')
                 .rename_axis(None, axis=1))
print (df1)
   col1  col2  col3  col4  col5  col6
0     4     2     1     0     1     2
1     3     2     0     4     0     1
2     3     1     3     2     0     1
3     1     1     3     0     1     4
4     1     1     1     1     3     3

Or use list comprehension with Counter and DataFrame constructor:

df1 = (pd.DataFrame([Counter(x) for x in df['col1']])
         .sort_index(axis=1)
         .fillna(0)
         .astype(int)
         .add_prefix('col'))
print (df1)
   col1  col2  col3  col4  col5  col6
0     4     2     1     0     1     2
1     3     2     0     4     0     1
2     3     1     3     2     0     1
3     1     1     3     0     1     4
4     1     1     1     1     3     3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.