How to merge multiple columns with same content in the excel output file using pandas

Question

i have a pandas dataframe like below table. for each SITEID in the first column, i've same value for other columns like Priority, Region and Vendor but not same in the History column.

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
=====================================================================================================
E1149       P3        R10     NSN       09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault: 00: 16
=====================================================================================================
E1149       P3        R10     NSN       09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
=====================================================================================================
E1150       P3        R10     E//       09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

at first i want to merge the first four columns (SITEID, Priority, Region and Vendor) per each siteID and then put all the relevant records in the History column against it like below:

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
                                        09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault:00: 16
                                        09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
                                        09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

what is the most efficient way to do this in the excel output file using xlswriter etc? i tried many solutions like swaplevel but no result.

Shubham Sharma · Accepted Answer · 2020-09-14 05:19:02Z

1

You can try a simple groupby and agg using .join with delimiter \n:

cols = ['SITEID', 'Priority', 'Region', 'Vendor']
df_merged = df.groupby(cols, as_index=False).agg('\n'.join)

Then save this merged dataframe to excel as:

df_merged.to_excel('file.xlsx')

Result:

answered Sep 14, 2020 at 5:19

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mahmood Over a year ago

Great idea and solution @Shubham Sharma this is exactly what i needed to have in the excel output file. thank you! :)

David Erickson · Accepted Answer · 2020-09-13 23:15:09Z

1

You can simply look for duplicate rows for the relevant columns with .loc. There are two parts to loc: 1. Rows and 2. columns:

For rows, you can use .duplicated() and specify the columns to look for duplicates by passing subset=([]). This will return True for duplicate rows.
For columns, you can use just pass the columns for the values that you want to change to blanks with ,['SITEID','Priority','Region','Vendor']]
Finally, set these specified rows and columns to blank with == ''

df.loc[df.duplicated(subset=(['SITEID','Priority','Region','Vendor'])),['SITEID','Priority','Region','Vendor']] = ''
df
Out[1]: 
  SITEID Priority Region Vendor  \
0  E1149       P3    R10    NSN   
1                                 
2                                 
3  E1150       P3    R10    E//   
4                                 

                                             HISTORY  
0  09-09 : ZRBSCN8, LUE1149 : Connector Faulty : ...  
1  09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabl...  
2  09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault :...  
3  09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear :...  
4  09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18

df.to_csv('test.csv', index=False)

answered Sep 13, 2020 at 23:15

David Erickson

16.7k2 gold badges21 silver badges37 bronze badges

3 Comments

Mahmood Over a year ago

thanks David, great idea! the only issue is when i want to draw a border in the excel output file using xlswriter format, the line will be draw between the History column rows correctly as they're still in the different rows logically. any idea to can fix this excel formatting?

David Erickson Over a year ago

@Mahmood can you paste a picture of excel of your desired format?

Mahmood Over a year ago

@ David Erickson thank you. the desired format is exactly what Shubham Sharma posted above.

Collectives™ on Stack Overflow

How to merge multiple columns with same content in the excel output file using pandas

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related