0

I found an answer to my question in another request on this site, however the answer provided doesn't work for me so I'm asking in a different request. I will use the same data and show results I'm getting.

So basically, I have a dataframe that one column has repeated values that I want to group in a single row, and I want to concatenate the values of another column as the value of the combined rows.

Here's the example:

data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])

df = pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])

The output is as expected:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19

This is what I want to get once I group on values for FIRST_NM and LAST_NM, and concatenate the values of PAGE_NUM:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16

The proposed solution, and it makes a lot of sense to me, is to use this:

df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))

However and unfortunately, this seems to have no effect at all, I'm getting the exact same results:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1
1   John        Smith       7
2   Eric        Adams       9
3   Jane        Doe         14
4   Jane        Doe         16
5   John        Smith       19

Anyone can help pointing out what I'm doing wrong?

Many thanks in advance!

JF

1 Answer 1

1

try via groupby() and agg():

out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':','.join})

If needed unique value then use:

out=df.groupby(['FIRST_NM','LAST_NM'],sort=False,as_index=False).agg({'PAGE_NUM':set})
out['PAGE_NUM']=out['PAGE_NUM'].astype(str).str[1:-1].str.replace("'",'')

Output of out:

    FIRST_NM    LAST_NM     PAGE_NUM
0   John        Smith       1,7,19
1   Eric        Adams       9
2   Jane        Doe         14,16
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.