1

I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.

So far I've tried this and it hasn't worked.

sed '$!N; /^\(.*\)\n\1$/!P; D'

file:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I got:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I want:

APPLE

ORANGES

BANANA

COOKIES

FRUITS

I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.

My goal is for this to eventually delete the second instance of BANANA.

Can anyone point me in the right direction?

Thanks

3
  • 2
    Any specific reason to use sed instead of say awk or tools like huniq? Commented Apr 5, 2022 at 5:26
  • Do the duplicates have to be adjacent? Or is it that you only want to keep the first appearance of a string? Commented Apr 5, 2022 at 21:09
  • It was just that I have to practice using Sed. I'm going to be working on other stuff for Awk. Commented Apr 7, 2022 at 5:12

4 Answers 4

4

Using sed

$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE

ORANGES

BANANA

COOKIES

FRUITS

Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.

Sign up to request clarification or add additional context in comments.

1 Comment

I know it says I'm not supposed to say thanks, but thanks. This worked, I spent a long time on just this. Hold space and pattern space is new to me so I'm going to check up on that. I understand all of up to G; and most of the rest.
2

mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?

~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS

~$ cat test.txt |  sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS

Comments

1

This might work for you (GNU sed):

   sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file

On the first line, insert a newline for regexp purposes.

Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).

At end of the file, remove the introduced newline and print the result.

Comments

0

Assuming that the reason, why You wanted to use the sed was that it is fast and available on Linux as a standard tool, You may want to consider using another standard Linux command line tool called "uniq" and sometimes combine it with yet another standard Linux command line tool, "sort".

ts3b@terminal01:~/demo$ ls
repeated_lines.txt
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt 
AAA 
BBB
BBB
CCC
AAA 
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq
AAA 
BBB
CCC
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort
AAA 
AAA 
BBB
BBB
CCC
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort | uniq
AAA 
BBB
CCC
ts3b@terminal01:~/demo$

On Linux the "sed" is the "GNU sed", which behaves differently than the "sed" command on FreeBSD. The "GNU sed" may be available on FreeBSD as "gsed". In the case of some regular expressions the two "sed-s" may behave the same way, but if one wants to save time by testing the regular expressions only on one of them, for example, the "GNU sed", then here's a candidate Bash snippet that might become handy at making one's Bash script to work on both, FreeBSD and Linux:

S_CMD_GNU_SED="sed"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_SED="gsed"
fi
#
# There's a similar case with GNU Make versus BSD Make:
#
S_CMD_GNU_MAKE="make"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_MAKE="gmake"
fi

Thank You for reading my comment :-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.