0

We have two CSV files: a.csv and b.csv.

a.csv has tree columns: label, item1, item2. b.csv has two columns: item1, item2. If item1 and item2 in a.csv also occurr in b.csv, that's a.csv and b.csv have same item1 and item2, the value of label in a.csv should be 1 instead.

For example:

a.csv:

label    item1     item2
0         123       35
0         342       721
0         876       243

b.csv:

item1     item2
 12        35
 32        721
 876       243

result.csv:

label    item1     item2
0         123       35
0         342       721
1         876       243
2
  • The result can be write in a new csv file "result.csv" Commented Apr 2, 2015 at 12:48
  • 1
    What did you try yet? Commented Apr 2, 2015 at 12:50

1 Answer 1

2

Read your a.csv into a dictionary; use a tuple of (item1, item2) as the key. Then when reading b.csv you can update the label for each entry in the dictionary as you process the file.

After this process, write out result.csv from the information in the dictionary.

import csv

rows = {}
with open('a.csv', 'r', newline='') as acsv:
    areader = csv.DictReader(acsv)
    for row in reader:
        # store the row based on the item1 and item2 columns
        key = (row['item1'], row['item2'])
        rows[key] = row

with open('b.csv', 'r', newline='') as bcsv:
    breader = csv.DictReader(bcsv)
    for row in reader:
        # set the label of matching rows to 1 when present
        key = (row['item1'], row['item2'])
        if key in rows:
            rows[key]['label'] = 1

with open('result.csv', 'w', newline='') as result:
    writer = csv.DictReader(result, fieldnames=areader.fieldnames)
    writer.writerows(rows.values())

I used csv.DictReader() objects to ease column name handling. Each row is presented as a dictionary, with the keys taken from the first row in the CSV file.

I also assumed you are using Python 3; if you are using Python 2, you'll have to adjust the open() calls to remove the newline='' argument, and you need to use binary mode ('rb' and 'wb'). I did not specify a codec for the files; currently the default system codec will be used to read and write. If that is incorrect, add encoding='...' arguments.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm not expecting any dict. the label is header from the csv file, and I have the same case. But not working.
@Syafiqur__: then my guess would be that you wanted to use rows['label'], not rows[key]['label']. Again, no code, no minimal reproducible example, so all I can do is guess. And I'm sorry, but you are not helping me improve this answer, so the comments here are not appropriate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.