1

My problem is with removing both duplicated lines. I have a text file:

192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.12 --- 00:A0:57:2E:A6:12
192.168.1.11 --- 00:1D:A2:80:3C:CC
192.168.1.7 --- F0:9F:C2:0A:48:E7
192.168.1.6 --- 80:2A:A8:C9:85:1C
192.168.1.1 --- F0:9F:C2:05:B7:A6
192.168.1.9 --- DC:4A:3E:DF:22:06
192.168.1.8 --- 80:2A:A8:C9:8E:F6
192.168.1.1 --- F0:9F:C2:05:B7:A6

192.168.1.7 --- F0:9F:C2:0A:48:E7

192.168.1.12 --- 00:A0:57:2E:A6:12

192.168.1.11 --- 00:1D:A2:80:3C:CC

192.168.1.6 --- 80:2A:A8:C9:85:1C

192.168.1.8 --- 80:2A:A8:C9:8E:F6

The text file is exactly as it looks like. Please help me with that i want to remove both duplicated lines so it only stays:

192.168.1.18 --- B8:27:EB:48:C3:B6

192.168.1.9 --- DC:4A:3E:DF:22:06

Thanks for your help guys.

1
  • As mentioned above, you can use Pandas. Numpy also has unique function for dropping duplicates. Commented Sep 18, 2017 at 11:03

3 Answers 3

2

Another short alternative with collections.Counter object:

import collections

with open('lines.txt', 'r') as f:
    for k,c in collections.Counter(f.read().splitlines()).items():
        if c == 1:
            print(k)

The output:

192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.9 --- DC:4A:3E:DF:22:06
Sign up to request clarification or add additional context in comments.

6 Comments

You can just pass the file object f to Counter, rather than doing read().splitlines() on it first.
@Blckknght, no, that won't work because of intermediate blank lines
@RomanPerekhrest Thank you! Your answer is what i have been searching for :)
@A.Babik, you're welcome
@RomanPerekhrest I wonder now how would i detect a changes in text file. Example: if there is a new line/s from the last time it was read. Any idea?
|
1

Not a lot of detail in the question, you've tagged numpy, is that a requirement or just an interest?

If you have no specific requirement, do it using the standard library:

d = {}
with open('/file/path', 'r') as f:
    for line in f:
        if line not in d:
            d[line] = 1
        else:
            d[line] += 1

no_dup = [line for line in d if d[line] < 2]

6 Comments

This is not numpy either...
@cᴏʟᴅsᴘᴇᴇᴅ Yes I realize that. I unfortunately don't have a lot of numpy experience. While its tagged in the post, I didn't see a specific request for a solution specific to that library. If you can do it simply with the standard library, why not?
@cᴏʟᴅsᴘᴇᴇᴅ, it's not mandatory to do it exactly on numpy. The main tags are python, python3, ...
@shash678 In your case the answer was incorrect anyway. But in general, you shouldn't make assumptions about things like that. Ask the OP for clarification.
@cᴏʟᴅsᴘᴇᴇᴅ Fair point. As such I made an edit asking the OP for clarification but still retained my solution for now.
|
1

Option 1
Using numpy

First, load your file with np.loadtxt.

x = np.loadtxt('file.txt', dtype=str, delimiter=',') 
                              # bogus delimiter so that a 1D array is loaded

Next, use np.unique with return_counts=True, and find all unique entries that were not repeated.

unique, counts = np.unique(x, return_counts=True)
out = unique[counts == 1]

out
array(['192.168.1.18 --- B8:27:EB:48:C3:B6',
       '192.168.1.9 --- DC:4A:3E:DF:22:06'],
      dtype='<U34')

Option 2
Using pandas

Load your data using pd.read_csv and then call drop_duplicates.

df  = pd.read_csv('file.txt', delimiter=',', header=None)

df
                                     0
0   192.168.1.18 --- B8:27:EB:48:C3:B6
1   192.168.1.12 --- 00:A0:57:2E:A6:12
2   192.168.1.11 --- 00:1D:A2:80:3C:CC
3    192.168.1.7 --- F0:9F:C2:0A:48:E7
4    192.168.1.6 --- 80:2A:A8:C9:85:1C
5    192.168.1.1 --- F0:9F:C2:05:B7:A6
6    192.168.1.9 --- DC:4A:3E:DF:22:06
7    192.168.1.8 --- 80:2A:A8:C9:8E:F6
8    192.168.1.1 --- F0:9F:C2:05:B7:A6
9    192.168.1.7 --- F0:9F:C2:0A:48:E7
10  192.168.1.12 --- 00:A0:57:2E:A6:12
11  192.168.1.11 --- 00:1D:A2:80:3C:CC
12   192.168.1.6 --- 80:2A:A8:C9:85:1C
13   192.168.1.8 --- 80:2A:A8:C9:8E:F6

df.drop_duplicates(keep=False)
                                    0
0  192.168.1.18 --- B8:27:EB:48:C3:B6
6   192.168.1.9 --- DC:4A:3E:DF:22:06

To save to your text, you can use pd.to_csv:

df.to_csv('file.txt', delimiter='')

5 Comments

Thank you for your answer :)
@COLDSPEED made it :) Thanks for the answer again.
@A.Babik you can only accept one. I hope you made sure you accepted the one you meant to (I got a notification too, that's why I ask)
@COLDSPEED I wish i could accept both because both answers are correct. I have accepted the one with the most clean output.
@COLDSPEED I wonder now how would i detect a changes in text file. Example: if there is a new line/s from the last time it was read. Any idea?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.