I am comparing two CSV files. I need to know which rows match by comparing specific columns. The output needs to be the rows that match.
Data:
CSV 1:
name, age, occupation
Alice,51,accountant
John,23,driver
Peter,32,plumber
Jose,50,doctor
CSV 2:
name, age, occupation
Alice,51,dentist
Ted,43,carpenter
Peter,32,plumber
Jose,50,doctor
Desired Ouput:
Rather than returning a boolean, I would like to output the matching rows when only comparing the name and age columns:
Alice,51,accountant
Alice,51,dentist
Peter,32,plumber
Jose,50,doctor
Code:
I am comparing the two CSVs to see if the columns in the 'columns_to_compare_test' list match. I return a boolean, true or false.
# read two csv files into dataframes
df1 = pd.read_csv('test.csv', sep=',', encoding='UTF-8')
df2 = pd.read_csv('test2.csv', sep=',', encoding='UTF-8')
# compare only the name and age columns
columns_to_compare_test = ['name', 'age']
# print true or false
print(df1[columns_to_compare_test].equals(df2[columns_to_compare_test]))
# output: false
Thank you.