Pandas: forcing merge from multiple rows from Excel file into a single row(s) into single lines

Question

I've been given a few sets of MS-Excel worksheets with a lot of nested data in areas, and I have researching for a few hours looking for a way to reduce each 'id' row to single rows. Specifically merging 'Step ID', 'Install Steps', and 'Expected step' into single lines with some formatting.

Here is shortened simple of the data within the Excel sheets I need to convert.

Name	ID	Host	Step ID	Install Step	Expected step	Extra
Test1	4	Cat	1	Move x to y	x is with y	x will protest
			2	move x away from y	x and y are not together	y will protest
Test2	5	Dog	1	remove x from tank	y is alone
			2	Drop duplicate of y, y2 in tank	y1 is not alone	y1 will protest
			3	Drop more duplicates of y into tank, y3 and y4		y1 and y2 will protest
test 3	6	Dog	1	empty tank	nothing is in tank

And I am looking to transform this excel sheet into the following

Name	ID	Host	Install Step	Expected step	Extra
Test1	4	Cat	1 - Move x to y 2 - move x away from y	1 - x is with y 2 - x and y are not together	1 - x will protest 2 - y will protest
Test2	5	Dog	1 - remove x from tank 2 - Drop duplicate of y, y2 in tank 3 - Drop more duplicates of y into tank, y3 and y4	1 - y is alone 2 - y1 is not alone	2 - y1 will protest <br / > y1 and y2 will protest
Test3	6	Dog	1 - empty tank	1 - nothing is in tank

I have testing a few of the other Stackoverflow questions and repsonses for pandas, but the few that closely match my need just fill in the empty areas with duplicate data.

tdy · Accepted Answer · 2021-04-03 04:01:24Z

2

If you melt() the dataframe:

melt = df.melt(['Name', 'ID', 'Host', 'Step ID']).ffill()

#      Name   ID  Host  Step ID      variable               value
# 0   Test1  4.0   Cat        1  Install Step         Move x to y
# 1   Test1  4.0   Cat        2  Install Step  move x away from y
# ...
# 16  Test2  5.0   Dog        3         Extra     y1 will protest
# 17  Test3  6.0   Dog        1         Extra     y1 will protest

You can combine the Step ID and value columns in one shot:

melt.value = melt['Step ID'].astype(str) + ' - ' + melt.value
melt = melt.drop('Step ID', axis=1)

#      Name   ID  Host      variable                   value
# 0   Test1  4.0   Cat  Install Step         1 - Move x to y
# 1   Test1  4.0   Cat  Install Step  2 - move x away from y
# ...
# 16  Test2  5.0   Dog         Extra     3 - y1 will protest
# 17  Test3  6.0   Dog         Extra     1 - y1 will protest

Then join each group's value list together with \n and unstack() to pivot back to the wide table:

melt.groupby(['Name', 'ID', 'Host', 'variable']).agg('\n'.join).unstack()

	Name	ID	Host	Expected Step	Extra	Install Step
0	Test1	4.0	Cat	1 - x is with y\n2 - x and y are not together	1 - x will protest\n2 - y will protest	1 - Move x to y\n2 - move x away from y
1	Test2	5.0	Dog	1 - y is alone\n2 - y1 is not alone\n3 - y1 an...	1 - y will protest\n2 - y1 will protest\n3 - y...	1 - remove x from tank\n2 - Drop duplicate of ...
2	Test3	6.0	Dog	1 - nothing is in tank	1 - y1 will protest	1 - empty tank

edited Apr 3, 2021 at 4:01

answered Apr 3, 2021 at 3:55

tdy

42.1k42 gold badges124 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JustBroken Over a year ago

Thank you for all the help. Looks like melt will resolve it nicely.

tdy Over a year ago

Great, no problem.

Sam Szotkowski · Accepted Answer · 2021-04-03 01:00:16Z

1

For the formatting bit:

df = df.ffill()  # forward fill the empty areas
step_id_str = df['Step ID'].astype(str)
for col in ['Install Step','Expected Step','Extra']:
    df[col] = step_id_str + ' - ' + df[col]

For merging the rows:

group = df.groupby('ID')
for col in ['Install Step','Expected Step','Extra']:
    df[col] = group[col].transform(lambda s: '\n'.join(s))
df = df.drop('Step ID', axis=1)
df = df.drop_duplicates()
df

edited Apr 3, 2021 at 1:00

answered Apr 3, 2021 at 0:46

Sam Szotkowski

3541 silver badge6 bronze badges

1 Comment

JustBroken Over a year ago

I was able to use part of you answer also. Thank you for the help

Collectives™ on Stack Overflow

Pandas: forcing merge from multiple rows from Excel file into a single row(s) into single lines

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related