remove duplicate from list and check if IP's from one list in another list

Question

I have to csv files. The first looks like this:

The second contains a list of IP:

139.15.250.196
139.15.5.176

I'd like to check if any given IP in from the first file is in the second file. This seams to work (please correct or provide hints if my code is broken) but the issue is that the first file contains many duplicate values e.g. 10.0.0.1 may appear x times and I was not able to find a way to remove duplicates. Could you please assist me or guide ?

import csv

filename = 'ip2.csv'
with open(filename) as f:
    reader = csv.reader(f)
    ip = []
    for row in reader:
        ip.append(row[0])


filename = 'bonk_https.csv'
with open(filename) as f:
    reader = csv.reader(f)
    ip_ext = []
    for row in reader:
        ip_ext.append(row[0])
        for a in ip:
            if a in ip_ext:
                print(a)

Have you looked at the Pandas library? You could import the CSVs into Panda using the read_csv command. Likely deduplicate the list in Pandas. Then execute an inner join in Pandas with the merge command to get the list of matching items. — ChrisG
– ChrisG, Commented Dec 10, 2018 at 20:45
delete duplicates in Pandas: chrisalbon.com/python/data_wrangling/pandas_delete_duplicates — ChrisG
– ChrisG, Commented Dec 10, 2018 at 20:47
merge/join in Pandas: shanelynn.ie/merge-join-dataframes-python-pandas-index-1 — ChrisG
– ChrisG, Commented Dec 10, 2018 at 20:49
Your code clearly isn't what you're running; it'll die immediately with a NameError (because reader isn't defined). Can you post a minimal reproducible example that can actually run? — ShadowRanger
– ShadowRanger, Commented Dec 10, 2018 at 21:15

not_a_bot_no_really_82353 · Accepted Answer · 2018-12-12 23:00:55Z

3

You can cast any list into a set with set(list). A set only holds one of each items and can be compared with member in set like a list. So just cast your ip list to a set.

with open(filename) as f:
    ip_ext = []
    for row in reader:
        ip_ext.append(row[0])
        for a in set(ip):
            if a in set(ip_ext): #well, you don't need a set her unless you also have duplicates in ip_ext
                print(a)

Alternatively just break/continue if you found your entry. This might help you with that

edited Dec 12, 2018 at 23:00

answered Dec 10, 2018 at 21:13

not_a_bot_no_really_82353

6197 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

postFix Over a year ago

Thank you but with your code I'm still getting duplicates :(

not_a_bot_no_really_82353 Over a year ago

please give us some example data and your code. I currently can't see how you can get duplicates if you compare each member of a set (which no longer has duplicates) exactly once with the ip_ext list that you made. Unless ip_ext itself has also duplicates.

not_a_bot_no_really_82353 Over a year ago

To be sure I updated my code. Please try it again. And please tell us more about your data.

postFix Over a year ago

Thank you. In fact it works :) The second file contains duplicates but that's ok. Please see the EDIT section of my question. I hope you can help me with it !

gboffi · Accepted Answer · 2018-12-10 22:44:18Z

1

I suggest that you normalize all the IPs,

with open(...) as f
   # a set comprehension of _normalized_ ips, this strips excess trailing zeros
   my_ips = {'.'.join('%d'%int(n) for n in t) 
                for t in [x.split(',')[0].split('.') for x in f]}

Next, you check each normalized IP from rthe second file against the IP s contained in the normalized set (note that, different from other answers, here you have a single loop, and that checking if an item is a member of a set, x in my_xs, is a highly optimized operation)

with open(...) as f:
    for line in f:
        ip = '.'.join('%d'%int(n) for n in line.split('.'))
        if ip in my_ips:
            ...
        else:
            ...

answered Dec 10, 2018 at 22:44

gboffi

25.4k10 gold badges62 silver badges98 bronze badges

Collectives™ on Stack Overflow

remove duplicate from list and check if IP's from one list in another list

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related