Using any awk and sort:
$ cat tst.sh
#!/usr/bin/env bash
sort -r "${@:--}" |
awk '
(NF == 2) || (index(prev" ",$0" ") != 1)
{ prev = $0 }
' |
sort
$ ./tst.sh file
a b
a b c d e
a b c d x
a b c d z
The " " at the end of each string in index() is necessary so that a b d would not falsely match as a substring of a b dog, assuming we only want whole-word comparisons, and a b e would match itself, assuming we want to delete exact duplicate lines as well as substring lines, e.g. given this more comprehensive sample input:
$ cat file2
a b
a b c
a b c d
a b c d e
a b c d x
a b c d z
a b d
a b dog
a b e
a b e
we get the expected output:
$ ./tst.sh file2
a b
a b c d e
a b c d x
a b c d z
a b d
a b dog
a b e
With the above script we sort the input first so that longer strings appear before shorter strings that start with the same characters, thereby making it easy for awk to test if the current string is a substring of the previous one, then we sort again for the final output.
That approach of sorting first means it'll work no matter what order the input is in, e.g.:
$ shuf file2 > file3
$ cat file3
a b
a b c d
a b c d z
a b dog
a b c d e
a b c d x
a b c
a b e
a b d
a b e
$ ./tst.sh file3
a b
a b c d e
a b c d x
a b c d z
a b d
a b dog
a b e
If we also wanted the output order to be the same as the input order given unsorted input like above, we could apply a Decorate-Sort-Undecorate idiom to add original line numbers first then sort by and remove those at the end:
$ cat tst2.sh
#!/usr/bin/env bash
awk -v OFS='\t' '{print NR, $0}' "${@:--}" |
sort -r -k2 |
awk -v OFS='\t' '
{ nr=$1; sub(/[^\t]+\t/,"") }
(NF == 2) || (index(prev" ",$0" ") != 1) {
print nr, $0
}
{ prev = $0 }
' |
sort -nk1 |
cut -f2-
$ ./tst2.sh file3
a b
a b c d z
a b dog
a b c d e
a b c d x
a b e
a b d
- a b c d da duplicate? Is- aor- bincomplete? What of- a 1or- a %? Edit the question and include this information. Do no post it in the comments where it can get lost.a b c d eappeared on 2 contiguous input lines, should it appear twice in the output or once?