1

i have a pandas dataframe like below table. for each SITEID in the first column, i've same value for other columns like Priority, Region and Vendor but not same in the History column.

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
=====================================================================================================
E1149       P3        R10     NSN       09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault: 00: 16
=====================================================================================================
E1149       P3        R10     NSN       09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
=====================================================================================================
E1150       P3        R10     E//       09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

at first i want to merge the first four columns (SITEID, Priority, Region and Vendor) per each siteID and then put all the relevant records in the History column against it like below:

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
                                        09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault:00: 16
                                        09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
                                        09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

what is the most efficient way to do this in the excel output file using xlswriter etc? i tried many solutions like swaplevel but no result.

0

2 Answers 2

1

You can try a simple groupby and agg using .join with delimiter \n:

cols = ['SITEID', 'Priority', 'Region', 'Vendor']
df_merged = df.groupby(cols, as_index=False).agg('\n'.join)

Then save this merged dataframe to excel as:

df_merged.to_excel('file.xlsx')

Result:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Great idea and solution @Shubham Sharma this is exactly what i needed to have in the excel output file. thank you! :)
1

You can simply look for duplicate rows for the relevant columns with .loc. There are two parts to loc: 1. Rows and 2. columns:

  1. For rows, you can use .duplicated() and specify the columns to look for duplicates by passing subset=([]). This will return True for duplicate rows.
  2. For columns, you can use just pass the columns for the values that you want to change to blanks with ,['SITEID','Priority','Region','Vendor']]
  3. Finally, set these specified rows and columns to blank with == ''

df.loc[df.duplicated(subset=(['SITEID','Priority','Region','Vendor'])),['SITEID','Priority','Region','Vendor']] = ''
df
Out[1]: 
  SITEID Priority Region Vendor  \
0  E1149       P3    R10    NSN   
1                                 
2                                 
3  E1150       P3    R10    E//   
4                                 

                                             HISTORY  
0  09-09 : ZRBSCN8, LUE1149 : Connector Faulty : ...  
1  09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabl...  
2  09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault :...  
3  09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear :...  
4  09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18

df.to_csv('test.csv', index=False)

enter image description here

3 Comments

thanks David, great idea! the only issue is when i want to draw a border in the excel output file using xlswriter format, the line will be draw between the History column rows correctly as they're still in the different rows logically. any idea to can fix this excel formatting?
@Mahmood can you paste a picture of excel of your desired format?
@ David Erickson thank you. the desired format is exactly what Shubham Sharma posted above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.