Using matching column values in two CSV files to create a new file with combined data

Question

I am currently trying to compare two CSV files to check if IP addresses in the first column of file1.csv are in a row in file2.csv using Python 3.6. If the address is in file2, I need the second column value of that row copied to a new file that is identical to file 1. The two file setups look like this:

file 1:

XX.XXX.XXX.1,Test1
XX.XXX.XXX.2,Test2
XX.XXX.XXX.3,Test3
XX.XXX.XXX.4,Test4
XX.XXX.XXX.5,Test5
XX.XXX.XXX.6,Test6
XX.XXX.XXX.7,Test7
XX.XXX.XXX.8,Test8

and so on

file 2:

XX.XXX.XXX.6, Name6
XX.XXX.XXX.7, Name7
XX.XXX.XXX.8, Name8

I need the result.csv file to look like this:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

The code I have so far is as follows:

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    for file2_row in file2:
        results_row = file1_row
        x = file2_row[3]
        if file1_row[1] == file2_row[1]:

        results_row.append('Found. Name: ' + x)
        found = True
        break
    row += 1
if not found:
    results_row.append('Not found in File1')
c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Right now this code is checking for identical rows not values. Meaning it will not match anything as it requires both the IP column and adjacent column to be equal on both files, in addition it matches both files' row 1, row 2, row3, and so on, but I need it to search through one to find the matches in the other, not compare rows by index.

Mike Müller · Accepted Answer · 2016-12-28 21:52:15Z

1

A pandas solution:

import pandas as pd

df1 = pd.read_csv('file_1.csv', names=['a', 'b'])
df2 = pd.read_csv('file_2.csv', names=['a', 'b'])
merged = pd.merge(df1, df2, on='a', how='outer')
merged.to_csv('results.csv', header=False, index=False, na_rep='Not found')

Content of results.csv:

XX.XXX.XXX.1,Test1,Not found
XX.XXX.XXX.2,Test2,Not found
XX.XXX.XXX.3,Test3,Not found
XX.XXX.XXX.4,Test4,Not found
XX.XXX.XXX.5,Test5,Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

edited Dec 28, 2016 at 21:52

answered Dec 27, 2016 at 20:12

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fean · Accepted Answer · 2016-12-27 20:20:45Z

0

I moved the position of results_row and changed indent after row+=1

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    results_row = file1_row  #Moved out from nested loop
    for file2_row in file2:        
        x = file2_row[1]
        if file1_row[0] == file2_row[0]:
            results_row.append(x)
            found = True
            break
    row += 1
    if not found:
        results_row.append('Not found')     
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

answered Dec 27, 2016 at 20:20

fean

5464 gold badges12 silver badges37 bronze badges

Comments

ettanany · Accepted Answer · 2016-12-27 20:22:41Z

0

A close solution to what you have tried would be as follows:

with open('result.csv', 'w') as out:
    with open('file1.csv', 'r') as f1, open('file2.csv', 'r') as f2:
        f2_lines = [line for line in f2.readlines() if len(line) > 1]
        f1_lines = [line for line in f1.readlines() if len(line) > 1]
        for line in f1_lines:
            val = 'Not found'
            b = [line.split(',')[0].strip() in item for item in f2_lines]
            if any(b):
                val = f2_lines[b.index(True)].split(',')[1].strip()
            out.write('{}, {}\n'.format(line.strip(), val))

Output:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

answered Dec 27, 2016 at 20:22

ettanany

20k9 gold badges49 silver badges64 bronze badges

2 Comments

K. Allaei Over a year ago

I tried using this code and it does indeed provide the correct matches, but it also provides incorrect ones. Some names were matched more than once. The file contents I posted are a small sample of the list of IPs I have, so it may not have shown up on the small scale, but the name that matched for the IP ending in .103 also showed up for the IP ending in.1, which had no matching name.

K. Allaei Over a year ago

I think it is a loop issue, perhaps it holds the value and then assigns it to the next open space?

martineau · Accepted Answer · 2016-12-29 18:21:15Z

0

Here's a non-pandas solution (assuming you're using Python 3.x):

import csv

present = {}
with open('file2.csv', 'r', newline='') as file2:
    reader = csv.reader(file2, skipinitialspace=True)
    for ip, name in reader:
        present[ip] = name

with open('file1.csv', 'r', newline='') as file1, \
     open('results.csv', 'w', newline='') as results:
    reader = csv.reader(file1, skipinitialspace=True)
    writer = csv.writer(results)
    for ip, name in reader:
        writer.writerow([ip, name, present.get(ip, ' Not found')])

File Results.csv:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

edited Dec 29, 2016 at 18:21

answered Dec 27, 2016 at 20:33

martineau

124k29 gold badges181 silver badges319 bronze badges

1 Comment

K. Allaei Over a year ago

I am using Python 3.6, my apologies, I should have specified in the question.

Collectives™ on Stack Overflow

Using matching column values in two CSV files to create a new file with combined data

4 Answers 4

Comments

Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related