3

I am trying to merge two csv files with a common id column and write the merge to a new file. I have tried the following but it is giving me an error -

import csv
from collections import OrderedDict

filenames = "stops.csv", "stops2.csv"
data = OrderedDict()
fieldnames = []
for filename in filenames:
    with open(filename, "rb") as fp:  # python 2
        reader = csv.DictReader(fp)
        fieldnames.extend(reader.fieldnames)
        for row in reader:
            data.setdefault(row["stop_id"], {}).update(row)

fieldnames = list(OrderedDict.fromkeys(fieldnames))
with open("merged.csv", "wb") as fp:
    writer = csv.writer(fp)
    writer.writerow(fieldnames)
    for row in data.itervalues():
        writer.writerow([row.get(field, '') for field in fieldnames])

Both files have the "stop_id" column but I'm getting this error back - KeyError: 'stop_id'

Any help would be much appreciated.

Thanks

5
  • data.setdefault(row["stop_id"], {}).update(row) - why so complex? Commented Jul 26, 2016 at 19:30
  • also, merging two tables by column is done with pandas.merge, see here pandas.pydata.org/pandas-docs/stable/… Commented Jul 26, 2016 at 19:32
  • I used another stack overflow example as input to this. Can you suggest an alternative? Thanks Commented Jul 26, 2016 at 19:32
  • great, thanks for that Alleo Commented Jul 26, 2016 at 19:33
  • 1
    @sgpbyrne - Please try to use Pandas module for this. You can achieve above just with 4-5 lines Commented Jul 29, 2016 at 17:50

2 Answers 2

3

Here is an example using pandas

import sys
from StringIO import StringIO
import pandas as pd

TESTDATA=StringIO("""DOB;First;Last
    2016-07-26;John;smith
    2016-07-27;Mathew;George
    2016-07-28;Aryan;Singh
    2016-07-29;Ella;Gayau
    """)

list1 = pd.read_csv(TESTDATA, sep=";")

TESTDATA=StringIO("""Date of Birth;Patient First Name;Patient Last Name
    2016-07-26;John;smith
    2016-07-27;Mathew;XXX
    2016-07-28;Aryan;Singh
    2016-07-20;Ella;Gayau
    """)


list2 = pd.read_csv(TESTDATA, sep=";")

print list2
print list1

common = pd.merge(list1, list2, how='left', left_on=['Last', 'First', 'DOB'], right_on=['Patient Last Name', 'Patient First Name', 'Date of Birth']).dropna()
print common
Sign up to request clarification or add additional context in comments.

Comments

2

Thanks Shijo.

This is what worked for me after - merged by the first column in each csv.

import csv
from collections import OrderedDict

with open('stops.csv', 'rb') as f:
    r = csv.reader(f)
    dict2 = {row[0]: row[1:] for row in r}

with open('stops2.csv', 'rb') as f:
    r = csv.reader(f)
    dict1 = OrderedDict((row[0], row[1:]) for row in r)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in d.iteritems():
         result.setdefault(key, []).extend(value)

with open('ab_combined.csv', 'wb') as f:
    w = csv.writer(f)
    for key, value in result.iteritems():
        w.writerow([key] + value)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.