0

I am currently trying to compare two CSV files to check if IP addresses in the first column of file1.csv are in a row in file2.csv using Python 3.6. If the address is in file2, I need the second column value of that row copied to a new file that is identical to file 1. The two file setups look like this:

file 1:

XX.XXX.XXX.1,Test1
XX.XXX.XXX.2,Test2
XX.XXX.XXX.3,Test3
XX.XXX.XXX.4,Test4
XX.XXX.XXX.5,Test5
XX.XXX.XXX.6,Test6
XX.XXX.XXX.7,Test7
XX.XXX.XXX.8,Test8

and so on

file 2:

XX.XXX.XXX.6, Name6
XX.XXX.XXX.7, Name7
XX.XXX.XXX.8, Name8

I need the result.csv file to look like this:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

The code I have so far is as follows:

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    for file2_row in file2:
        results_row = file1_row
        x = file2_row[3]
        if file1_row[1] == file2_row[1]:

        results_row.append('Found. Name: ' + x)
        found = True
        break
    row += 1
if not found:
    results_row.append('Not found in File1')
c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Right now this code is checking for identical rows not values. Meaning it will not match anything as it requires both the IP column and adjacent column to be equal on both files, in addition it matches both files' row 1, row 2, row3, and so on, but I need it to search through one to find the matches in the other, not compare rows by index.

4 Answers 4

1

A pandas solution:

import pandas as pd

df1 = pd.read_csv('file_1.csv', names=['a', 'b'])
df2 = pd.read_csv('file_2.csv', names=['a', 'b'])
merged = pd.merge(df1, df2, on='a', how='outer')
merged.to_csv('results.csv', header=False, index=False, na_rep='Not found')

​Content of results.csv:

XX.XXX.XXX.1,Test1,Not found
XX.XXX.XXX.2,Test2,Not found
XX.XXX.XXX.3,Test3,Not found
XX.XXX.XXX.4,Test4,Not found
XX.XXX.XXX.5,Test5,Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8
Sign up to request clarification or add additional context in comments.

Comments

0

I moved the position of results_row and changed indent after row+=1

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    results_row = file1_row  #Moved out from nested loop
    for file2_row in file2:        
        x = file2_row[1]
        if file1_row[0] == file2_row[0]:
            results_row.append(x)
            found = True
            break
    row += 1
    if not found:
        results_row.append('Not found')     
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Comments

0

A close solution to what you have tried would be as follows:

with open('result.csv', 'w') as out:
    with open('file1.csv', 'r') as f1, open('file2.csv', 'r') as f2:
        f2_lines = [line for line in f2.readlines() if len(line) > 1]
        f1_lines = [line for line in f1.readlines() if len(line) > 1]
        for line in f1_lines:
            val = 'Not found'
            b = [line.split(',')[0].strip() in item for item in f2_lines]
            if any(b):
                val = f2_lines[b.index(True)].split(',')[1].strip()
            out.write('{}, {}\n'.format(line.strip(), val))

Output:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

2 Comments

I tried using this code and it does indeed provide the correct matches, but it also provides incorrect ones. Some names were matched more than once. The file contents I posted are a small sample of the list of IPs I have, so it may not have shown up on the small scale, but the name that matched for the IP ending in .103 also showed up for the IP ending in.1, which had no matching name.
I think it is a loop issue, perhaps it holds the value and then assigns it to the next open space?
0

Here's a non-pandas solution (assuming you're using Python 3.x):

import csv

present = {}
with open('file2.csv', 'r', newline='') as file2:
    reader = csv.reader(file2, skipinitialspace=True)
    for ip, name in reader:
        present[ip] = name

with open('file1.csv', 'r', newline='') as file1, \
     open('results.csv', 'w', newline='') as results:
    reader = csv.reader(file1, skipinitialspace=True)
    writer = csv.writer(results)
    for ip, name in reader:
        writer.writerow([ip, name, present.get(ip, ' Not found')])

File Results.csv:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

1 Comment

I am using Python 3.6, my apologies, I should have specified in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.