I can delete duplicate lines in files using below commands: 1) sort -u and uniq commands. is that possible using sed or awk ?
-
2if you have sort and uniq, why do you want to use sed or awk?Skriptotajs– Skriptotajs2014-02-27 11:33:07 +00:00Commented Feb 27, 2014 at 11:33
-
Well, possible it is, since both are turing complete languages, as far as I recall. The question is what for you'd use them, as pointed by @Skriptotajs.Rubens– Rubens2014-02-27 11:34:02 +00:00Commented Feb 27, 2014 at 11:34
-
Possible duplicate of How can I delete duplicate lines in a file in Unix?tripleee– tripleee2018-10-24 04:45:01 +00:00Commented Oct 24, 2018 at 4:45
Add a comment
|
3 Answers
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
3 Comments
mherzl
This looks awesome but somehow it's not working for me on macOS Sierra.
Alex Muravyov
only for small files, if file bigger then ram + swap - not worked
glenn jackman
For some definition of "small" . Measured in GB
sort and uniq these only need to remove duplicates cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
2 Comments
tripleee
Though the
cat is useless.dave58
The
uniq is also useless. Just use sort -u filename The -u invokes sort's unique mode. [ None of which answered the OP's question... ]After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'
1 Comment
tripleee
Obviously these alternatives are unacceptable if you can't sort the file.