0

I have two xlsx files as follows:

value1        value2   
3900162750    10    
3900163003    19
2311009200    22 

value1        value2   
3900163003    5    
3900162750    9
2311009200    88

How do I match value1 from xlsx1 with xlsx2 and compare related value2?

for instance, match 3900163003 from xlsx1 with xlsx2 and find out value2 is decreasing.

2
  • What have you tried so far? Commented Jan 5, 2022 at 7:37
  • Easy. Use pandas. Find out how to find rows with a specific value. Compare the other value. Good luck. Commented Jan 5, 2022 at 7:44

2 Answers 2

1

You can use pandas. First read the excel files as pandas.DataFrame objects. Then using pd.DataFrame.merge method, merge the two dataframes on 'value1' column. Then create another column 'Decreasing' by comparing the two 'value2' columns. By default, the column from the left (in this case, df1) gets suffix _x and the one from the right (in this case, df2) gets suffix _y:

import pandas as pd
df1 = pd.read_excel('first_excel_file.xlsx')
df2 = pd.read_excel('second_excel_file.xlsx')
df = df1.merge(df2, on='value1')
df['Decreasing'] = df['value2_x'] > df['value2_y']

Output:

       value1  value2_x  value2_y  Decreasing
0  3900162750        10         9        True
1  3900163003        19         5        True
2  2311009200        22        88       False
Sign up to request clarification or add additional context in comments.

Comments

0

Use pandas and pandas merge :

import pandas as pd

df = df1.merge(df2, on='value1', how= 'outer', suffixes=('_df1', '_df2'))
df['decrease'] = df['value2_df2'] < df['value2_df1']

giving:

       value1  value2_df1  value2_df2  decrease
0  3900162750          10           9      True
1  3900163003          19           5      True
2  2311009200          22          88     False

Notes:

  • how parameter used to provide some output if df1 and df2 are not perfectly matching
  • suffixes parameter used to provide the origin of the data
  • it could be suitable to give a other name to the 'decrease' column to remember the way it is computed

Data from :

data1 = """
value1        value2   
3900162750    10    
3900163003    19
2311009200    22 
"""
data2 = """
value1        value2   
3900163003    5    
3900162750    9
2311009200    88
"""
df1 = pd.read_csv(io.StringIO(data1), sep=r' +')
df2 = pd.read_csv(io.StringIO(data2), sep=r' +')

In real live read_csv should be replaced by read_excel

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.