How can I "concat" rows by same value in a column in Pandas?

Question

I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.

Input Data :

ID  F_Name  L_Name  Address SSN     Phone
123 Sam     Doe     123     12345   111-111-1111
123 Sam     Doe     123     12345   222-222-2222
123 Sam     Doe     abc345  12345   111-111-1111
123 Sam     Doe     abc345  12345   222-222-2222
456 Naveen  Gupta   456     45678   333-333-3333
456 Manish  Gupta   456     45678   333-333-3333

Expected Output Data :

myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"

}

Code Tried :

df = pd.read_csv('data.csv')
print(df)

Please specify an expected output and the actual output you are getting. The 'code' you have tried is loading a dataframe and printing it. It does not relate to your question. — pu239
– pu239, Commented Aug 5, 2021 at 6:54
@samarth I have edit my question. I have just loaded the df and i don't know how to achieve the output in pandas. — Naveen Gupta
– Naveen Gupta, Commented Aug 5, 2021 at 7:09
note that if you are willing to have numpy arrays instead of lists, it's more concise to just aggregate pd.unique directly: myschema = df.groupby('ID', as_index=False).agg(pd.unique).to_dict(orient='records') — tdy
– tdy, Commented Aug 5, 2021 at 7:42

Anurag Dabas · Accepted Answer · 2021-08-05 08:24:02Z

1

try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
        .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
    .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
    .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
  'F_Name': 'Sam',
  'L_Name': 'Doe',
  'Address': ['abc345', '123'],
  'SSN': 12345,
  'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
  'F_Name': ['Naveen', 'Manish'],
  'L_Name': 'Gupta',
  'Address': '456',
  'SSN': 45678,
  'Phone': '333-333-3333'}]

edited Aug 5, 2021 at 8:24

answered Aug 5, 2021 at 6:53

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Naveen Gupta Over a year ago

code is working fine. Can you give some details about code that will be helpful.

Naveen Gupta Over a year ago

I was checking code for new id as well but it is not working for that. I have edited my input data. Do you have any suggestion onthat.

Anurag Dabas Over a year ago

@NaveenGupta added details..kindly have a look :)

Naveen Gupta Over a year ago

Thanku so much. Actually I need to grouping on Id column only. I have updated my input and output dataset. let me know if you have something to say on that.

tdy Over a year ago

you can change the grouping to only have ID and remove the [0] at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')

|

Collectives™ on Stack Overflow

How can I "concat" rows by same value in a column in Pandas?

1 Answer 1

12 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Related