0

Suppose i have Columns in file1.csv as

Customer id    Name 

Q1             Alen
W2             Ricky
E3             Katrina
R4             Anya
T5             Leonardo

and Columns in file2.csv as

Customer id    Name

Q1             Alen
W2             Harry
E3             Katrina
R4             Anya
T5             Leonard

here as you can see for Customer id: W2 the corresponding name is not matching. so the output.csv should be like below:

Customer id  Status

Q1           Matching
W2           Not matching
E3           Matching
R4           Matching
T5           Matching

How can i get the above output using python.

P.S. whats the code for comparing multiple columns, not just column Name?

My code

import csv
with open('file1.csv', 'rt', encoding='utf-8') as csvfile1:
    csvfile1_indices = dict((r[1], i) for i, r in enumerate(csv.reader(csvfile1)))

with open('file2.csv', 'rt', encoding='utf-8') as csvfile2:
    with open('output.csv', 'w') as results:    
        reader = csv.reader(csvfile2)
        writer = csv.writer(results)

        writer.writerow(next(reader, []) + ['status'])

        for row in reader:
            index = csvfile1_indices.get(row[1])
            if index is not None:
                message = 'matching'
                writer.writerow(row + [message])

            else:
                 message = 'not matching'
                 writer.writerow(row + [message])

    results.close()

This is working fine, but can i write in any other easier way to get the same output? and what changes do i need to make to compare multiple columns?

5
  • What have you tried so far? How about just use string comparison tool like winmerge? Commented Nov 5, 2018 at 9:50
  • similar question here and here and here Commented Nov 5, 2018 at 10:05
  • @蕭為元 You can see the code i tried. I've edited the question Commented Nov 5, 2018 at 10:08
  • Can you use Pandas.? Commented Nov 5, 2018 at 10:32
  • @Sreeram yes, ofcourse Commented Nov 5, 2018 at 10:47

3 Answers 3

2

If you don't mind using Pandas, you can do it in 5 lines of code :

import pandas as pd 

# assuming id columns are identical and contain the same values
df1 = pd.read_csv('file1.csv', index_col='Customer_id')
df2 = pd.read_csv('file2.csv', index_col='Customer_id')

df3 = pd.DataFrame(columns=['status'], index=df1.index)
df3['status'] = (df1['Name'] == df2['Name']).replace([True, False], ['Matching', 'Not Matching'])

df3.to_csv('output.csv')

Edit : removed sep = '\t' to use default comma separator.

Sign up to request clarification or add additional context in comments.

2 Comments

I got ValueError: Index Customer_id invalid. but changing sep='\t' to sep=',' solved the error
Sorry my bad ! you can actually omit the separator argument altogether if you're using comma-separated values/
0

Read both csv files into two different dictionaries and iterate over any of the dictionary and check for the same key in other. If you want order use OrderedDict

1 Comment

python script ? @Sanjay Idpuganti
0

You can merge on multiple columns:

f1
  Customer_id      Name
0          Q1      Alen
1          W2     Ricky
2          E3   Katrina
3          R4      Anya
4          T5  Leonardo

f2
  Customer_id      Name
0          Q1      Alen
1          W2     Harry
2          E3   Katrina
3          R4      Anya
4          T5  Leonardo

m = f1.merge(f2, on=['Customer_id', 'Name'], indicator='Status', how='outer')
  Customer_id      Name      Status
0          Q1      Alen        both
1          W2     Ricky   left_only
2          E3   Katrina        both
3          R4      Anya        both
4          T5  Leonardo        both
5          W2     Harry  right_only

m['Status'] = m['Status'].map({'both': 'Matching', 
                               'left_only': 'Not matching', 
                               'right_only': 'Not matching'})

m.drop_duplicates(subset=['Customer_id', 'Status'])
m.drop(['Name'], axis=1)
  Customer_id        Status
0          Q1      Matching
1          W2  Not matching
2          E3      Matching
3          R4      Matching
4          T5      Matching

2 Comments

@ Alex, this will not result into the desired output , otherwise its easy :)
@pytorch I have updated the code a little bit to make it shorter/easier to maintain.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.