0

I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.

Input Data :

ID  F_Name  L_Name  Address SSN     Phone
123 Sam     Doe     123     12345   111-111-1111
123 Sam     Doe     123     12345   222-222-2222
123 Sam     Doe     abc345  12345   111-111-1111
123 Sam     Doe     abc345  12345   222-222-2222
456 Naveen  Gupta   456     45678   333-333-3333
456 Manish  Gupta   456     45678   333-333-3333

Expected Output Data :

myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"

}

Code Tried :

df = pd.read_csv('data.csv')
print(df)
3
  • Please specify an expected output and the actual output you are getting. The 'code' you have tried is loading a dataframe and printing it. It does not relate to your question. Commented Aug 5, 2021 at 6:54
  • @samarth I have edit my question. I have just loaded the df and i don't know how to achieve the output in pandas. Commented Aug 5, 2021 at 7:09
  • note that if you are willing to have numpy arrays instead of lists, it's more concise to just aggregate pd.unique directly: myschema = df.groupby('ID', as_index=False).agg(pd.unique).to_dict(orient='records') Commented Aug 5, 2021 at 7:42

1 Answer 1

1

try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
        .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
    .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
    .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
  'F_Name': 'Sam',
  'L_Name': 'Doe',
  'Address': ['abc345', '123'],
  'SSN': 12345,
  'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
  'F_Name': ['Naveen', 'Manish'],
  'L_Name': 'Gupta',
  'Address': '456',
  'SSN': 45678,
  'Phone': '333-333-3333'}]
Sign up to request clarification or add additional context in comments.

12 Comments

code is working fine. Can you give some details about code that will be helpful.
I was checking code for new id as well but it is not working for that. I have edited my input data. Do you have any suggestion onthat.
@NaveenGupta added details..kindly have a look :)
Thanku so much. Actually I need to grouping on Id column only. I have updated my input and output dataset. let me know if you have something to say on that.
you can change the grouping to only have ID and remove the [0] at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.