How to create linked array of dataframe column based on key using Pandas?

Question

I have a dataframe df like this. I need to create linked list of array of df['Id'] based on df['code'].

Input (df)

Id    code  description                start          end          lat    lon
23-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   19.07  72.08
35-B   24   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.27
37-B   28   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.07
41-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   
38-B   24   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.07
27-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   19.07  72.08
78-B   56   Fault located at Chennai  2021-02-24      2021-02-26  
21-C   46   Fault located at Mumbai   2021-04-21      2021-04-28

Expected Output

 linkedId          code  description                start          end          lat  lon  
   23-A,41A,27-A      45   Fault located at Mumbai   2021-03-21    2021-03-28   19.07  72.08
    35-B,38-B         24   Fault located at Chennai  2021-02-24    2021-02-26   13.02  80.07
    37-B              28   Fault located at Chennai  2021-02-24    2021-02-26   13.02  80.07
    78-B              56   Fault located at Chennai  2021-02-24    2021-02-26  
    21-C              46   Fault located at Mumbai   2021-04-21    2021-04-28

How can this be done in pandas

@QuangHoang, Getting an error sequence item 0: expected str instance, int found — aeapen
– aeapen, Commented May 12, 2021 at 5:05
Not all Id are strings as in your sample data. You can try: lambda x: ','.join(x.astype(str)) instead of ','.join. — Quang Hoang
– Quang Hoang, Commented May 12, 2021 at 5:08
@QuangHoang, works now. but how do it attached rest dataframe elements to this linked Ids — aeapen
– aeapen, Commented May 12, 2021 at 5:13

Nk03 · Accepted Answer · 2021-05-12 05:49:16Z

1

TRY:

result = (
    df.assign(Id=df.groupby('code')['Id']
              .transform(','.join))
    .drop_duplicates(subset='code')
    .rename(columns={'Id': 'linkedId'})
)

edited May 12, 2021 at 5:49

answered May 12, 2021 at 5:13

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nk03 Over a year ago

can you explain what’s not working? @aeapen

aeapen Over a year ago

I am missing few df[codes] while applying your code. But the ones linked are correct. But df.groupby('code').agg({'Id':lambda x: ','.join(x.astype(str))}) gives me correct linkage between Id and code

Nk03 Over a year ago

I'm just dropping the rows based on the 'Id' column after that. You can choose how you wanna filter after getting the correct linkage.

aeapen Over a year ago

When I removed .drop_duplicates(subset='Id') linkage is happening correctly . Ideally I would keep the first record of each unique code after the linkage

aeapen Over a year ago

it should be `drop_duplicates(subset='code'). Rest all good

Fahad Vadakkumpadatah · Accepted Answer · 2021-05-12 05:19:17Z

0

df.groupby('code').sum()

Result:

    Id
code    
24  35-B38-B
28  37-B
45  23-A41-A27-A
46  21-C
56  78-B

answered May 12, 2021 at 5:19

Fahad Vadakkumpadatah

6910 bronze badges

Comments

wwnde · Accepted Answer · 2021-05-12 05:27:46Z

Use groupby transform, str.cat and drop duplicates

df=df.assign(linkedid=df.groupby(['description','start','end'])['Id'].transform(lambda X:X.str.cat(sep=','))).drop_duplicates(subset=['linkedid'])




    Id  code               description        start          end    lat  \
0   23-A    45   Fault located at Mumbai   2021-03-21   2021-03-28  19.07   
1   35-B    24  Fault located at Chennai   2021-02-24   2021-02-26  13.02   
4  38-B     24  Fault located at Chennai  2021-02-24   2021-02-26   13.02   
5  27-A     45   Fault located at Mumbai   2021-03-21   2021-03-28  19.07   
7   21-C    46   Fault located at Mumbai   2021-04-21   2021-04-28          

     lon         linkedid  
0  72.08        23-A,41-A  
1  80.27  35-B,37-B,78-B   
4  80.07            38-B   
5  72.08            27-A   
7                    21-C

Collectives™ on Stack Overflow

How to create linked array of dataframe column based on key using Pandas?

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related