Python: how to update a csv file from another csv file

Question

We have two CSV files: a.csv and b.csv.

a.csv has tree columns: label, item1, item2. b.csv has two columns: item1, item2. If item1 and item2 in a.csv also occurr in b.csv, that's a.csv and b.csv have same item1 and item2, the value of label in a.csv should be 1 instead.

For example:

a.csv:

label    item1     item2
0         123       35
0         342       721
0         876       243

b.csv:

item1     item2
 12        35
 32        721
 876       243

result.csv:

label    item1     item2
0         123       35
0         342       721
1         876       243

The result can be write in a new csv file "result.csv"

Huan Ren
– Huan Ren

2015-04-02 12:48:03 +00:00
Commented Apr 2, 2015 at 12:48 — Huan Ren
– Huan Ren, Commented Apr 2, 2015 at 12:48
What did you try yet?

TobiasR.
– TobiasR.

2015-04-02 12:50:00 +00:00
Commented Apr 2, 2015 at 12:50 — TobiasR.
– TobiasR., Commented Apr 2, 2015 at 12:50

Martijn Pieters · Accepted Answer · 2015-04-02 13:19:14Z

2

Read your a.csv into a dictionary; use a tuple of (item1, item2) as the key. Then when reading b.csv you can update the label for each entry in the dictionary as you process the file.

After this process, write out result.csv from the information in the dictionary.

import csv

rows = {}
with open('a.csv', 'r', newline='') as acsv:
    areader = csv.DictReader(acsv)
    for row in reader:
        # store the row based on the item1 and item2 columns
        key = (row['item1'], row['item2'])
        rows[key] = row

with open('b.csv', 'r', newline='') as bcsv:
    breader = csv.DictReader(bcsv)
    for row in reader:
        # set the label of matching rows to 1 when present
        key = (row['item1'], row['item2'])
        if key in rows:
            rows[key]['label'] = 1

with open('result.csv', 'w', newline='') as result:
    writer = csv.DictReader(result, fieldnames=areader.fieldnames)
    writer.writerows(rows.values())

I used csv.DictReader() objects to ease column name handling. Each row is presented as a dictionary, with the keys taken from the first row in the CSV file.

I also assumed you are using Python 3; if you are using Python 2, you'll have to adjust the open() calls to remove the newline='' argument, and you need to use binary mode ('rb' and 'wb'). I did not specify a codec for the files; currently the default system codec will be used to read and write. If that is incorrect, add encoding='...' arguments.

edited Apr 2, 2015 at 13:19

answered Apr 2, 2015 at 12:51

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Syafiqur_ Over a year ago

I'm not expecting any dict. the label is header from the csv file, and I have the same case. But not working.

Martijn Pieters Over a year ago

@Syafiqur__: then my guess would be that you wanted to use rows['label'], not rows[key]['label']. Again, no code, no minimal reproducible example, so all I can do is guess. And I'm sorry, but you are not helping me improve this answer, so the comments here are not appropriate.

Collectives™ on Stack Overflow

Python: how to update a csv file from another csv file

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related