0

I want to merge two CSV files based on a field The 1st one looks like this:

ID, field1, field2
1,a,green
2,b,white
2,b,red
2,b,blue
3,c,black

The second one looks like:

ID, field3
1,value1
2,value2

What I want to have is:

ID, field1, field2,field3
1,a,green,value1
2,b,white,value2
2,b,red,value2
2,b,blue,value2
3,c,black,''

I'm using pydev on eclipse

import csv

endings0=[]
endings1=[]
with open("salaries.csv") as book0:
    for line in book0:
        endings0.append(line.split(',')[-1])
        endings1.append(line.split(',')[0])

linecounter=0


res = open("result.csv","w")

with open('total.csv') as book2:
    for line in book2:
        # if not header line:

        l=line.split(',')[0]
        for linecounter in range(0,endings1.__len__()):            
            if( l == endings1[linecounter]):
                res.writelines(line.replace("\n","") +','+str(endings0[linecounter]))


print("done") 
7
  • 3
    Good question, but what you have tried so far? Commented Apr 21, 2015 at 18:45
  • 1
    Have you considered using a database? Commented Apr 21, 2015 at 19:15
  • I updates the question by adding the code, but i'm missing the last line (3,c,black,'') and i'm not sure if this is the best way to do it Commented Apr 21, 2015 at 19:16
  • Add the piece of code you tried to the question Commented Apr 21, 2015 at 19:17
  • Why import csv, when you don't even use it? Commented Apr 21, 2015 at 19:20

2 Answers 2

3

There are a bunch of things wrong with what you're doing

  1. You should really really be using the classes in the csv module to read and write csv files. Importing the module isn't enough. You actually need to call its functions

  2. You should never find yourself typing endings1.__len__(). Use len(endings1) instead

  3. You should never find yourself typing for linecounter in range(0,len(endings1)).
    Use either for linecounter, _ in enumerate(endings1),
    or better yet for end1, end2 in zip(endings1, endings2)

  4. A dictionary is a much better data structure for lookup than a pair of parallel lists. To quote pike:

    If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.

Here's how I'd do it:

import csv

with open('second.csv') as f:
    # look, a builtin to read csv file lines as dictionaries!
    reader = csv.DictReader(f)

    # build a mapping of id to field3
    id_to_field3 = {row['ID']: row['field3'] for row in reader}

# you can put more than one open inside a with statement
with open('first.csv') as f, open('result.csv', 'o') as fo:
    # csv even has a class to write files!
    reader = csv.DictReader(f)
    res = csv.DictWriter(fo, fieldnames=reader.fieldnames + ['field3'])

    res.writeheader()
    for row in reader:
        # .get returns its second argument if there was no match
        row['field3'] = id_to_field3.get(row['ID'], '')
        res.writerow(row)
Sign up to request clarification or add additional context in comments.

Comments

0

I have a high-level solution for you. Deserialize your first CSV into dict1 mapping ID to a list containing a list containing field1 and field2. Deserialize your second CSV into dict2 mapping ID to field3.

for each (id, list) in dict1, do list.append(dict2.setdefault(id, '')). Now serialize it back into CSV using whatever serializer you were using before.

I used dictionary's setdefault because I noticed that ID 3 is in the first CSV file but not the second.

1 Comment

"whatever serializer you were using before" - that'll be that well known robust csv interface, the raw text stream then...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.