Removing all duplicated lines from a file in Python/NumPy [duplicate]

Question

My problem is with removing both duplicated lines. I have a text file:

192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.12 --- 00:A0:57:2E:A6:12
192.168.1.11 --- 00:1D:A2:80:3C:CC
192.168.1.7 --- F0:9F:C2:0A:48:E7
192.168.1.6 --- 80:2A:A8:C9:85:1C
192.168.1.1 --- F0:9F:C2:05:B7:A6
192.168.1.9 --- DC:4A:3E:DF:22:06
192.168.1.8 --- 80:2A:A8:C9:8E:F6
192.168.1.1 --- F0:9F:C2:05:B7:A6

192.168.1.7 --- F0:9F:C2:0A:48:E7

192.168.1.12 --- 00:A0:57:2E:A6:12

192.168.1.11 --- 00:1D:A2:80:3C:CC

192.168.1.6 --- 80:2A:A8:C9:85:1C

192.168.1.8 --- 80:2A:A8:C9:8E:F6

The text file is exactly as it looks like. Please help me with that i want to remove both duplicated lines so it only stays:

192.168.1.18 --- B8:27:EB:48:C3:B6

192.168.1.9 --- DC:4A:3E:DF:22:06

Thanks for your help guys.

As mentioned above, you can use Pandas. Numpy also has unique function for dropping duplicates. — narendra-choudhary
– narendra-choudhary, Commented Sep 18, 2017 at 11:03

RomanPerekhrest · Accepted Answer · 2017-09-18 11:17:59Z

2

Another short alternative with collections.Counter object:

import collections

with open('lines.txt', 'r') as f:
    for k,c in collections.Counter(f.read().splitlines()).items():
        if c == 1:
            print(k)

The output:

192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.9 --- DC:4A:3E:DF:22:06

answered Sep 18, 2017 at 11:17

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Blckknght Over a year ago

You can just pass the file object f to Counter, rather than doing read().splitlines() on it first.

RomanPerekhrest Over a year ago

@Blckknght, no, that won't work because of intermediate blank lines

A. Babik Over a year ago

@RomanPerekhrest Thank you! Your answer is what i have been searching for :)

RomanPerekhrest Over a year ago

@A.Babik, you're welcome

A. Babik Over a year ago

@RomanPerekhrest I wonder now how would i detect a changes in text file. Example: if there is a new line/s from the last time it was read. Any idea?

|

pstatix · Accepted Answer · 2017-09-18 11:09:36Z

1

Not a lot of detail in the question, you've tagged numpy, is that a requirement or just an interest?

If you have no specific requirement, do it using the standard library:

d = {}
with open('/file/path', 'r') as f:
    for line in f:
        if line not in d:
            d[line] = 1
        else:
            d[line] += 1

no_dup = [line for line in d if d[line] < 2]

edited Sep 18, 2017 at 11:09

answered Sep 18, 2017 at 11:06

pstatix

3,8884 gold badges27 silver badges48 bronze badges

6 Comments

cs95 Over a year ago

This is not numpy either...

pstatix Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Yes I realize that. I unfortunately don't have a lot of numpy experience. While its tagged in the post, I didn't see a specific request for a solution specific to that library. If you can do it simply with the standard library, why not?

RomanPerekhrest Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ, it's not mandatory to do it exactly on numpy. The main tags are python, python3, ...

cs95 Over a year ago

@shash678 In your case the answer was incorrect anyway. But in general, you shouldn't make assumptions about things like that. Ask the OP for clarification.

pstatix Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Fair point. As such I made an edit asking the OP for clarification but still retained my solution for now.

|

cs95 · Accepted Answer · 2017-09-18 11:24:39Z

1

Option 1
Using numpy

First, load your file with np.loadtxt.

x = np.loadtxt('file.txt', dtype=str, delimiter=',') 
                              # bogus delimiter so that a 1D array is loaded

Next, use np.unique with return_counts=True, and find all unique entries that were not repeated.

unique, counts = np.unique(x, return_counts=True)
out = unique[counts == 1]

out
array(['192.168.1.18 --- B8:27:EB:48:C3:B6',
       '192.168.1.9 --- DC:4A:3E:DF:22:06'],
      dtype='<U34')

Option 2
Using pandas

Load your data using pd.read_csv and then call drop_duplicates.

df  = pd.read_csv('file.txt', delimiter=',', header=None)

df
                                     0
0   192.168.1.18 --- B8:27:EB:48:C3:B6
1   192.168.1.12 --- 00:A0:57:2E:A6:12
2   192.168.1.11 --- 00:1D:A2:80:3C:CC
3    192.168.1.7 --- F0:9F:C2:0A:48:E7
4    192.168.1.6 --- 80:2A:A8:C9:85:1C
5    192.168.1.1 --- F0:9F:C2:05:B7:A6
6    192.168.1.9 --- DC:4A:3E:DF:22:06
7    192.168.1.8 --- 80:2A:A8:C9:8E:F6
8    192.168.1.1 --- F0:9F:C2:05:B7:A6
9    192.168.1.7 --- F0:9F:C2:0A:48:E7
10  192.168.1.12 --- 00:A0:57:2E:A6:12
11  192.168.1.11 --- 00:1D:A2:80:3C:CC
12   192.168.1.6 --- 80:2A:A8:C9:85:1C
13   192.168.1.8 --- 80:2A:A8:C9:8E:F6

df.drop_duplicates(keep=False)
                                    0
0  192.168.1.18 --- B8:27:EB:48:C3:B6
6   192.168.1.9 --- DC:4A:3E:DF:22:06

To save to your text, you can use pd.to_csv:

df.to_csv('file.txt', delimiter='')

edited Sep 18, 2017 at 11:24

answered Sep 18, 2017 at 11:14

cs95

406k106 gold badges744 silver badges797 bronze badges

5 Comments

A. Babik Over a year ago

Thank you for your answer :)

A. Babik Over a year ago

@COLDSPEED made it :) Thanks for the answer again.

cs95 Over a year ago

@A.Babik you can only accept one. I hope you made sure you accepted the one you meant to (I got a notification too, that's why I ask)

A. Babik Over a year ago

@COLDSPEED I wish i could accept both because both answers are correct. I have accepted the one with the most clean output.

A. Babik Over a year ago

@COLDSPEED I wonder now how would i detect a changes in text file. Example: if there is a new line/s from the last time it was read. Any idea?

Collectives™ on Stack Overflow

Removing all duplicated lines from a file in Python/NumPy [duplicate]

3 Answers 3

6 Comments

6 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

6 Comments

5 Comments

Linked

Related