How to merge two text files around a common value in python

Question

I have two different files.

I need to merge these into one file. There is a common value. The two files have this format. The matches will not be in sequence. Dataset1 line1 may not match dataset2 line1. It is more likely dataset1 line1 will match dataset2 line16 or line 45.

Bold is the matching values. Any directional help will be appreciated.

BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52
BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31


MRM1234-BEEC-1635753E001     25.6    70.29
MRM1234-BPLZ-1814737E003     8.12    18.13
MRM1234-BFLP-2470883E001     12.92   18.8

I know how to use a line.split to get the array of each element.

I know how to count into the first column L[6:4] of the second data set to get the matching 4 letter value.

I've tried several ways suggested but have not succeeded.

How do I merge all the columns in a single row joined by the unique 4 digit identifier? Matching of the unique value and then writing to one line eludes me.

BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31,12.92,18.8 - I realize I need a dictionary but can't make it work. If you look you'll see the end result is the two numerical values from the second dataset matched to the first set and just added in as csv — Seth
– Seth, Commented Jun 20, 2014 at 20:10

Christian Berendt · Accepted Answer · 2014-06-20 20:25:32Z

2

Contents of file dat1:

BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52
BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31

Contents of file dat2:

MRM1234-BEEC-1635753E001     25.6    70.29
MRM1234-BPLZ-1814737E003     8.12    18.13
MRM1234-BFLP-2470883E001     12.92   18.8

Use this quick & dirty script to concatenate the lines of both files like described.

dat1 = {}
with open('dat1') as f:
    for line in f.readlines():
        dat1[line.split(',')[0]] = line.strip().split(',')[1:]

dat2 = {}
with open('dat2') as f:
    for line in f.readlines():
        key = line.strip().split()[0].split('-')[1]
        dat2[key] = line.strip().split()[1:]

for key in dat1.keys():
    print("%s,%s,%s" % (key, str.join(',', dat1[key]), str.join(',', dat2[key])))

This will produce the following output.

BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31,12.92,18.8
BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06,25.6,70.29
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52,8.12,18.13

answered Jun 20, 2014 at 20:25

Christian Berendt

3,6063 gold badges15 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Seth Over a year ago

I get a key error ''. To try and simplify I have redone my work so the SiteCode is the first field in both files. I modified your script to eliminate the .split('-') work and reused the dat1 section for dat2 (with the name changes). That still gives me the error. I did print out dat1 and dat1 and it has a lot of data their. I don't fully understand it yet but am working on it. I verified the files have no empty values.

Seth Over a year ago

Your solution works. MY data files (emailed to me by vendors) are using a few duplicate IDs. Thank you. Helped my programming and showed me a vendor error all at once.

Seth Over a year ago

Hmmmm one detail. If dat1 has a site code that doesn't exist it fails. That's rare but possible. I will need to check for that.

cox · Accepted Answer · 2014-06-20 20:48:32Z

0

Sorry for mess ...

def parse(d1, d2):
    d1 = d1.split('\n')
    data1 = [x.split(',') for x in d1 if x]
    d2 = d2.split('\n')
    data2 = [x.split(' ') for x in d2 if x]
    target = []
    for x in data2:
        d = [y for y in x if y]
        dd = d[0].split('-')
        dd.extend(d[1:])
        target.append(dd)
    ret = []
    while data1:
        x = data1.pop()
        for y in target:
            if x[0] == y[1]:
                z = x
                z.extend(y[-2:])
                ret.append(z)
    for x in ret:
        print(x)


parse(data1, data2)

Where data1, data2 are the content of files

answered Jun 20, 2014 at 20:48

cox

7215 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to merge two text files around a common value in python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related