How do I remove duplicate lines using Sed without sorting?

Question

I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.

So far I've tried this and it hasn't worked.

sed '$!N; /^\(.*\)\n\1$/!P; D'

file:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I got:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I want:

APPLE

ORANGES

BANANA

COOKIES

FRUITS

I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.

My goal is for this to eventually delete the second instance of BANANA.

Can anyone point me in the right direction?

Thanks

Any specific reason to use sed instead of say awk or tools like huniq? — Sundeep
– Sundeep, Commented Apr 5, 2022 at 5:26
Do the duplicates have to be adjacent? Or is it that you only want to keep the first appearance of a string? — Andy Lester
– Andy Lester, Commented Apr 5, 2022 at 21:09
It was just that I have to practice using Sed. I'm going to be working on other stuff for Awk. — grunchyelliptical
– grunchyelliptical, Commented Apr 7, 2022 at 5:12

sseLtaH · Accepted Answer · 2022-04-05 03:57:12Z

4

Using sed

$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE

ORANGES

BANANA

COOKIES

FRUITS

Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.

answered Apr 5, 2022 at 3:57

sseLtaH

11.3k5 gold badges17 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

grunchyelliptical Over a year ago

I know it says I'm not supposed to say thanks, but thanks. This worked, I spent a long time on just this. Hold space and pattern space is new to me so I'm going to check up on that. I understand all of up to G; and most of the rest.

clogwog · Accepted Answer · 2022-04-05 00:10:19Z

2

mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?

~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS

~$ cat test.txt |  sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS

answered Apr 5, 2022 at 0:10

clogwog

3732 silver badges15 bronze badges

Comments

potong · Accepted Answer · 2022-04-05 15:17:45Z

1

This might work for you (GNU sed):

   sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file

On the first line, insert a newline for regexp purposes.

Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).

At end of the file, remove the introduced newline and print the result.

answered Apr 5, 2022 at 15:17

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

Martin Vahi · Accepted Answer · 2024-01-13 00:33:04Z

Assuming that the reason, why You wanted to use the sed was that it is fast and available on Linux as a standard tool, You may want to consider using another standard Linux command line tool called "uniq" and sometimes combine it with yet another standard Linux command line tool, "sort".

ts3b@terminal01:~/demo$ ls
repeated_lines.txt
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt 
AAA 
BBB
BBB
CCC
AAA 
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq
AAA 
BBB
CCC
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort
AAA 
AAA 
BBB
BBB
CCC
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort | uniq
AAA 
BBB
CCC
ts3b@terminal01:~/demo$

On Linux the "sed" is the "GNU sed", which behaves differently than the "sed" command on FreeBSD. The "GNU sed" may be available on FreeBSD as "gsed". In the case of some regular expressions the two "sed-s" may behave the same way, but if one wants to save time by testing the regular expressions only on one of them, for example, the "GNU sed", then here's a candidate Bash snippet that might become handy at making one's Bash script to work on both, FreeBSD and Linux:

S_CMD_GNU_SED="sed"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_SED="gsed"
fi
#
# There's a similar case with GNU Make versus BSD Make:
#
S_CMD_GNU_MAKE="make"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_MAKE="gmake"
fi

Thank You for reading my comment :-)

Collectives™ on Stack Overflow

How do I remove duplicate lines using Sed without sorting?

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related