1

I have two different files.

I need to merge these into one file. There is a common value. The two files have this format. The matches will not be in sequence. Dataset1 line1 may not match dataset2 line1. It is more likely dataset1 line1 will match dataset2 line16 or line 45.

Bold is the matching values. Any directional help will be appreciated.

BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52
BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31


MRM1234-BEEC-1635753E001     25.6    70.29
MRM1234-BPLZ-1814737E003     8.12    18.13
MRM1234-BFLP-2470883E001     12.92   18.8

I know how to use a line.split to get the array of each element.

I know how to count into the first column L[6:4] of the second data set to get the matching 4 letter value.

I've tried several ways suggested but have not succeeded.

How do I merge all the columns in a single row joined by the unique 4 digit identifier? Matching of the unique value and then writing to one line eludes me.

2
  • 1
    Can you please give an example of a merged line. Commented Jun 20, 2014 at 20:08
  • BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31,12.92,18.8 - I realize I need a dictionary but can't make it work. If you look you'll see the end result is the two numerical values from the second dataset matched to the first set and just added in as csv Commented Jun 20, 2014 at 20:10

2 Answers 2

2

Contents of file dat1:

BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52
BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31

Contents of file dat2:

MRM1234-BEEC-1635753E001     25.6    70.29
MRM1234-BPLZ-1814737E003     8.12    18.13
MRM1234-BFLP-2470883E001     12.92   18.8

Use this quick & dirty script to concatenate the lines of both files like described.

dat1 = {}
with open('dat1') as f:
    for line in f.readlines():
        dat1[line.split(',')[0]] = line.strip().split(',')[1:]

dat2 = {}
with open('dat2') as f:
    for line in f.readlines():
        key = line.strip().split()[0].split('-')[1]
        dat2[key] = line.strip().split()[1:]

for key in dat1.keys():
    print("%s,%s,%s" % (key, str.join(',', dat1[key]), str.join(',', dat2[key])))

This will produce the following output.

BFLP,BF-OP,,180.1,34.89,,40.0,58.26,8533.8,7.31,12.92,18.8
BEEC,BE-EC,,154.7,46.07,,31.63,54.6,4833.6,5.06,25.6,70.29
BPLZ,BE-LZ,,390.6,62.62,,49.0,145.0,27.3,61.52,8.12,18.13
Sign up to request clarification or add additional context in comments.

3 Comments

I get a key error ''. To try and simplify I have redone my work so the SiteCode is the first field in both files. I modified your script to eliminate the .split('-') work and reused the dat1 section for dat2 (with the name changes). That still gives me the error. I did print out dat1 and dat1 and it has a lot of data their. I don't fully understand it yet but am working on it. I verified the files have no empty values.
Your solution works. MY data files (emailed to me by vendors) are using a few duplicate IDs. Thank you. Helped my programming and showed me a vendor error all at once.
Hmmmm one detail. If dat1 has a site code that doesn't exist it fails. That's rare but possible. I will need to check for that.
0

Sorry for mess ...

def parse(d1, d2):
    d1 = d1.split('\n')
    data1 = [x.split(',') for x in d1 if x]
    d2 = d2.split('\n')
    data2 = [x.split(' ') for x in d2 if x]
    target = []
    for x in data2:
        d = [y for y in x if y]
        dd = d[0].split('-')
        dd.extend(d[1:])
        target.append(dd)
    ret = []
    while data1:
        x = data1.pop()
        for y in target:
            if x[0] == y[1]:
                z = x
                z.extend(y[-2:])
                ret.append(z)
    for x in ret:
        print(x)


parse(data1, data2)

Where data1, data2 are the content of files

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.