Python merge csv files with matching Index

Question

I want to merge two CSV files based on a field The 1st one looks like this:

ID, field1, field2
1,a,green
2,b,white
2,b,red
2,b,blue
3,c,black

The second one looks like:

ID, field3
1,value1
2,value2

What I want to have is:

ID, field1, field2,field3
1,a,green,value1
2,b,white,value2
2,b,red,value2
2,b,blue,value2
3,c,black,''

I'm using pydev on eclipse

import csv

endings0=[]
endings1=[]
with open("salaries.csv") as book0:
    for line in book0:
        endings0.append(line.split(',')[-1])
        endings1.append(line.split(',')[0])

linecounter=0


res = open("result.csv","w")

with open('total.csv') as book2:
    for line in book2:
        # if not header line:

        l=line.split(',')[0]
        for linecounter in range(0,endings1.__len__()):            
            if( l == endings1[linecounter]):
                res.writelines(line.replace("\n","") +','+str(endings0[linecounter]))


print("done")

I updates the question by adding the code, but i'm missing the last line (3,c,black,'') and i'm not sure if this is the best way to do it — Eliz Rose
– Eliz Rose, Commented Apr 21, 2015 at 19:16

Eric · Accepted Answer · 2015-04-21 19:34:57Z

There are a bunch of things wrong with what you're doing

You should really really be using the classes in the csv module to read and write csv files. Importing the module isn't enough. You actually need to call its functions
You should never find yourself typing endings1.__len__(). Use len(endings1) instead
You should never find yourself typing for linecounter in range(0,len(endings1)).
Use either for linecounter, _ in enumerate(endings1),
or better yet for end1, end2 in zip(endings1, endings2)
A dictionary is a much better data structure for lookup than a pair of parallel lists. To quote pike:

If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.

Here's how I'd do it:

import csv

with open('second.csv') as f:
    # look, a builtin to read csv file lines as dictionaries!
    reader = csv.DictReader(f)

    # build a mapping of id to field3
    id_to_field3 = {row['ID']: row['field3'] for row in reader}

# you can put more than one open inside a with statement
with open('first.csv') as f, open('result.csv', 'o') as fo:
    # csv even has a class to write files!
    reader = csv.DictReader(f)
    res = csv.DictWriter(fo, fieldnames=reader.fieldnames + ['field3'])

    res.writeheader()
    for row in reader:
        # .get returns its second argument if there was no match
        row['field3'] = id_to_field3.get(row['ID'], '')
        res.writerow(row)

11th Hour Worker · Accepted Answer · 2015-04-21 19:14:35Z

0

I have a high-level solution for you. Deserialize your first CSV into dict1 mapping ID to a list containing a list containing field1 and field2. Deserialize your second CSV into dict2 mapping ID to field3.

for each (id, list) in dict1, do list.append(dict2.setdefault(id, '')). Now serialize it back into CSV using whatever serializer you were using before.

I used dictionary's setdefault because I noticed that ID 3 is in the first CSV file but not the second.

answered Apr 21, 2015 at 19:14

11th Hour Worker

3893 silver badges18 bronze badges

1 Comment

Eric Over a year ago

"whatever serializer you were using before" - that'll be that well known robust csv interface, the raw text stream then...

Collectives™ on Stack Overflow

Python merge csv files with matching Index

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related