how to count number of values in 1d array in one of column of Pandas dataframe

Question

I have a data frame and in some of the rows, for one of the columns, I have a 1D array. for example:

how I can count the number of values in the arrays in column data (separated by comma) for each row and show the number of them in a new column of new data frame same as fig 2:

Welcome to Stack Overflow! Please include any relevant information as text directly into your question, do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: Why not upload images of code/errors when asking a question? — Henry Ecker
– Henry Ecker ♦, Commented Jun 13, 2021 at 16:00
If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see How to make good reproducible pandas examples. — Henry Ecker
– Henry Ecker ♦, Commented Jun 13, 2021 at 16:00

SeaBean · Accepted Answer · 2021-06-13 18:26:48Z

1

You can use .str.len() to get the item count in lists in column data and then use .groupby() to aggregate the count of same name using .sum(), as follows:

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5

Edit

If the column data consists of strings looking like arrays/lists, instead of the 1D array as mentioned in the question, you can run the following code to convert the column into real arrays/lists first:

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Test Run

Test Data Setup

nan = np.nan
# dict of dataframe dump by df.to_dict() as provided by OP in the comment:
data = {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}
df = pd.DataFrame(data)

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Run solution codes

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5

edited Jun 13, 2021 at 18:26

answered Jun 13, 2021 at 16:18

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

WhoIsKi Over a year ago

thank you but for my original data, the result of this code is the number of characters. what's your idea?

WhoIsKi Over a year ago

when I read the sample .xlsx file as a pandas data frame, the result is the number of characters. john : 51.0 , amanda : 0.0 , sara : 50.0

SeaBean Over a year ago

@Kla Probably your so-called 1-D array is not a real array/list, but string with square brackets at the ends with other substrings inside the long strings. Could you show me the contents of the dataframe by using e.g. df.to_dict() ?

WhoIsKi Over a year ago

yes, I think you're right. the contents of that column are some strings that separated by a comma. each set of strings is in square brackets.

WhoIsKi Over a year ago

could you create an excel file with these example data? {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}

|

Collectives™ on Stack Overflow

how to count number of values in 1d array in one of column of Pandas dataframe

1 Answer 1

Edit

Test Run

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit

Test Run

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related