1

I have a data frame and in some of the rows, for one of the columns, I have a 1D array. for example: fig1

how I can count the number of values in the arrays in column data (separated by comma) for each row and show the number of them in a new column of new data frame same as fig 2:

fig2

4
  • Welcome to Stack Overflow! Please include any relevant information as text directly into your question, do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: Why not upload images of code/errors when asking a question? Commented Jun 13, 2021 at 16:00
  • If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see How to make good reproducible pandas examples. Commented Jun 13, 2021 at 16:00
  • Also why is John 7 and not 6? Commented Jun 13, 2021 at 16:01
  • @HenryEcker thanks! You're right! 6 is true. Commented Jun 13, 2021 at 16:04

1 Answer 1

1

You can use .str.len() to get the item count in lists in column data and then use .groupby() to aggregate the count of same name using .sum(), as follows:

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5

Edit

If the column data consists of strings looking like arrays/lists, instead of the 1D array as mentioned in the question, you can run the following code to convert the column into real arrays/lists first:

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Test Run

Test Data Setup

nan = np.nan
# dict of dataframe dump by df.to_dict() as provided by OP in the comment:
data = {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}
df = pd.DataFrame(data)

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Run solution codes

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5
Sign up to request clarification or add additional context in comments.

8 Comments

thank you but for my original data, the result of this code is the number of characters. what's your idea?
when I read the sample .xlsx file as a pandas data frame, the result is the number of characters. john : 51.0 , amanda : 0.0 , sara : 50.0
@Kla Probably your so-called 1-D array is not a real array/list, but string with square brackets at the ends with other substrings inside the long strings. Could you show me the contents of the dataframe by using e.g. df.to_dict() ?
yes, I think you're right. the contents of that column are some strings that separated by a comma. each set of strings is in square brackets.
could you create an excel file with these example data? {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.