0

I have a dataframe df like this. I need to create linked list of array of df['Id'] based on df['code'].

Input (df)

Id    code  description                start          end          lat    lon
23-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   19.07  72.08
35-B   24   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.27
37-B   28   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.07
41-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   
38-B   24   Fault located at Chennai  2021-02-24      2021-02-26   13.02  80.07
27-A   45   Fault located at Mumbai   2021-03-21      2021-03-28   19.07  72.08
78-B   56   Fault located at Chennai  2021-02-24      2021-02-26  
21-C   46   Fault located at Mumbai   2021-04-21      2021-04-28   

Expected Output

 linkedId          code  description                start          end          lat  lon  
   23-A,41A,27-A      45   Fault located at Mumbai   2021-03-21    2021-03-28   19.07  72.08
    35-B,38-B         24   Fault located at Chennai  2021-02-24    2021-02-26   13.02  80.07
    37-B              28   Fault located at Chennai  2021-02-24    2021-02-26   13.02  80.07
    78-B              56   Fault located at Chennai  2021-02-24    2021-02-26  
    21-C              46   Fault located at Mumbai   2021-04-21    2021-04-28  

How can this be done in pandas

4
  • something with df.groupby('code').agg({'Id':','.join})? Commented May 12, 2021 at 5:01
  • @QuangHoang, Getting an error sequence item 0: expected str instance, int found Commented May 12, 2021 at 5:05
  • Not all Id are strings as in your sample data. You can try: lambda x: ','.join(x.astype(str)) instead of ','.join. Commented May 12, 2021 at 5:08
  • @QuangHoang, works now. but how do it attached rest dataframe elements to this linked Ids Commented May 12, 2021 at 5:13

3 Answers 3

1

TRY:

result = (
    df.assign(Id=df.groupby('code')['Id']
              .transform(','.join))
    .drop_duplicates(subset='code')
    .rename(columns={'Id': 'linkedId'})
)
Sign up to request clarification or add additional context in comments.

5 Comments

can you explain what’s not working? @aeapen
I am missing few df[codes] while applying your code. But the ones linked are correct. But df.groupby('code').agg({'Id':lambda x: ','.join(x.astype(str))}) gives me correct linkage between Id and code
I'm just dropping the rows based on the 'Id' column after that. You can choose how you wanna filter after getting the correct linkage.
When I removed .drop_duplicates(subset='Id') linkage is happening correctly . Ideally I would keep the first record of each unique code after the linkage
it should be `drop_duplicates(subset='code'). Rest all good
0

df.groupby('code').sum()

Result:

    Id
code    
24  35-B38-B
28  37-B
45  23-A41-A27-A
46  21-C
56  78-B

Comments

0

Use groupby transform, str.cat and drop duplicates

df=df.assign(linkedid=df.groupby(['description','start','end'])['Id'].transform(lambda X:X.str.cat(sep=','))).drop_duplicates(subset=['linkedid'])




    Id  code               description        start          end    lat  \
0   23-A    45   Fault located at Mumbai   2021-03-21   2021-03-28  19.07   
1   35-B    24  Fault located at Chennai   2021-02-24   2021-02-26  13.02   
4  38-B     24  Fault located at Chennai  2021-02-24   2021-02-26   13.02   
5  27-A     45   Fault located at Mumbai   2021-03-21   2021-03-28  19.07   
7   21-C    46   Fault located at Mumbai   2021-04-21   2021-04-28          

     lon         linkedid  
0  72.08        23-A,41-A  
1  80.27  35-B,37-B,78-B   
4  80.07            38-B   
5  72.08            27-A   
7                    21-C  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.