Compare two excel files using python

Question

I have two xlsx files as follows:

value1        value2   
3900162750    10    
3900163003    19
2311009200    22 

value1        value2   
3900163003    5    
3900162750    9
2311009200    88

How do I match value1 from xlsx1 with xlsx2 and compare related value2?

for instance, match 3900163003 from xlsx1 with xlsx2 and find out value2 is decreasing.

Easy. Use pandas. Find out how to find rows with a specific value. Compare the other value. Good luck. — moooeeeep
– moooeeeep, Commented Jan 5, 2022 at 7:44

user7864386 · Accepted Answer · 2022-01-05 07:54:52Z

1

You can use pandas. First read the excel files as pandas.DataFrame objects. Then using pd.DataFrame.merge method, merge the two dataframes on 'value1' column. Then create another column 'Decreasing' by comparing the two 'value2' columns. By default, the column from the left (in this case, df1) gets suffix _x and the one from the right (in this case, df2) gets suffix _y:

import pandas as pd
df1 = pd.read_excel('first_excel_file.xlsx')
df2 = pd.read_excel('second_excel_file.xlsx')
df = df1.merge(df2, on='value1')
df['Decreasing'] = df['value2_x'] > df['value2_y']

Output:

       value1  value2_x  value2_y  Decreasing
0  3900162750        10         9        True
1  3900163003        19         5        True
2  2311009200        22        88       False

answered Jan 5, 2022 at 7:54

user7864386

Sign up to request clarification or add additional context in comments.

Comments

hpchavaz · Accepted Answer · 2022-01-05 08:11:25Z

Use pandas and pandas merge :

import pandas as pd

df = df1.merge(df2, on='value1', how= 'outer', suffixes=('_df1', '_df2'))
df['decrease'] = df['value2_df2'] < df['value2_df1']

giving:

       value1  value2_df1  value2_df2  decrease
0  3900162750          10           9      True
1  3900163003          19           5      True
2  2311009200          22          88     False

Notes:

how parameter used to provide some output if df1 and df2 are not perfectly matching
suffixes parameter used to provide the origin of the data
it could be suitable to give a other name to the 'decrease' column to remember the way it is computed

Data from :

data1 = """
value1        value2   
3900162750    10    
3900163003    19
2311009200    22 
"""
data2 = """
value1        value2   
3900163003    5    
3900162750    9
2311009200    88
"""
df1 = pd.read_csv(io.StringIO(data1), sep=r' +')
df2 = pd.read_csv(io.StringIO(data2), sep=r' +')

In real live read_csv should be replaced by read_excel

Collectives™ on Stack Overflow

Compare two excel files using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related