9

I just have a file.txt with multiple lines, I would like to remove duplicate lines without sorting the file. what command can i use in unix bash ?

sample of file.txt

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
orangejuice;orange;juice_apple

sample of output:

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
2

2 Answers 2

38

One way using awk:

awk '!a[$0]++' file.txt
Sign up to request clarification or add additional context in comments.

5 Comments

You can't write this to a file via an alias sourced from bashrc > output.txt it has only one line?
root@server:/tmp# alias RDL="awk '!a[\$0]++' cleanList.txt > cleanList2.txt" bash: !a[\$0]++': event not found root@server:/tmp# alias RDL="awk '\!a[$0]++' cleanList.txt > cleanList2.txt" root@mdserver:/tmp# RDL awk: cmd. line:1: \!a[bash]++ awk: cmd. line:1: ^ backslash not last character on line root@server:/tmp# alias RDL="awk '\\!a[$0]++' cleanList.txt > cleanList2.txt" ???
Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at stackoverflow.com/questions/11532157/…
better yet the uniq command works in an alias even man7.org/linux/man-pages/man1/uniq.1.html
@MasterJames: You'll need to single quote that expression, then escape the single quotes like: alias RDL='awk '\''!a[$0]++'\'' cleanList.txt > cleanList2.txt'. See: stackoverflow.com/a/9899594/751863. Alternatively, just use a function.
14

You can use Perl for this:

perl -ne 'print unless $seen{$_}++' file.txt

The -n switch makes Perl process the file line by line. Each line ($_) is stored as a key in a hash named "seen", but since ++ happens after returning the value, the line is printed the first time it is met.

5 Comments

This in an alias when output to a file > output.txt creates an empty file? alias RDL="perl -ne 'print unless $seen{$_}++' cleanList.txt > cleanList2.txt" root@server:/tmp# RDL Can't modify anonymous hash ({}) in postincrement (++) at -e line 1, near "}++" Execution of -e aborted due to compilation errors. root@server:/tmp#
Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at stackoverflow.com/questions/11532157/…
the uniq command works in an alias even man7.org/linux/man-pages/man1/uniq.1.html
@MasterJames: The OP wanted to process the file "without sorting", which uniq can't do.
I see now uniq only removes repeat lines not dups from input. It only skips whwn they'really the same on the next line (aka repeats not dups). This is fine for my situation, so i didn't notice that sort makes dups=repeats which uniq skips. Without sort dups that are non repeating are not removed. Thanks for clarity.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.