how to delete duplicate lines in a text file in unix bash? [duplicate]

Question

I just have a file.txt with multiple lines, I would like to remove duplicate lines without sorting the file. what command can i use in unix bash ?

sample of file.txt

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
orangejuice;orange;juice_apple

sample of output:

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple

I'd like to see this closed as duplicate, too, but I hope there is a better question to link to. — tripleee
– tripleee, Commented Aug 11, 2013 at 10:00
Linux Bash commands to remove duplicates from a CSV file. Change the delimiter. — jww
– jww, Commented Jul 13, 2018 at 9:39

Steve · Accepted Answer · 2013-08-11 12:27:28Z

38

One way using awk:

awk '!a[$0]++' file.txt

answered Aug 11, 2013 at 12:27

Steve

55.1k13 gold badges94 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Master James Over a year ago

You can't write this to a file via an alias sourced from bashrc > output.txt it has only one line?

Master James Over a year ago

root@server:/tmp# alias RDL="awk '!a[\$0]++' cleanList.txt > cleanList2.txt" bash: !a[\$0]++': event not found root@server:/tmp# alias RDL="awk '\!a[$0]++' cleanList.txt > cleanList2.txt" root@mdserver:/tmp# RDL awk: cmd. line:1: \!a[bash]++ awk: cmd. line:1: ^ backslash not last character on line root@server:/tmp# alias RDL="awk '\\!a[$0]++' cleanList.txt > cleanList2.txt" ???

Master James Over a year ago

Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at stackoverflow.com/questions/11532157/…

Master James Over a year ago

better yet the uniq command works in an alias even man7.org/linux/man-pages/man1/uniq.1.html

Steve Over a year ago

@MasterJames: You'll need to single quote that expression, then escape the single quotes like: alias RDL='awk '\''!a[$0]++'\'' cleanList.txt > cleanList2.txt'. See: stackoverflow.com/a/9899594/751863. Alternatively, just use a function.

choroba · Accepted Answer · 2013-08-11 10:24:11Z

14

You can use Perl for this:

perl -ne 'print unless $seen{$_}++' file.txt

The -n switch makes Perl process the file line by line. Each line ($_) is stored as a key in a hash named "seen", but since ++ happens after returning the value, the line is printed the first time it is met.

edited Aug 11, 2013 at 10:24

answered Aug 11, 2013 at 9:48

choroba

245k27 gold badges221 silver badges304 bronze badges

5 Comments

Master James Over a year ago

This in an alias when output to a file > output.txt creates an empty file? alias RDL="perl -ne 'print unless $seen{$_}++' cleanList.txt > cleanList2.txt" root@server:/tmp# RDL Can't modify anonymous hash ({}) in postincrement (++) at -e line 1, near "}++" Execution of -e aborted due to compilation errors. root@server:/tmp#

Master James Over a year ago

Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at stackoverflow.com/questions/11532157/…

Master James Over a year ago

the uniq command works in an alias even man7.org/linux/man-pages/man1/uniq.1.html

choroba Over a year ago

@MasterJames: The OP wanted to process the file "without sorting", which uniq can't do.

Master James Over a year ago

I see now uniq only removes repeat lines not dups from input. It only skips whwn they'really the same on the next line (aka repeats not dups). This is fine for my situation, so i didn't notice that sort makes dups=repeats which uniq skips. Without sort dups that are non repeating are not removed. Thanks for clarity.

Collectives™ on Stack Overflow

how to delete duplicate lines in a text file in unix bash? [duplicate]

2 Answers 2

5 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

5 Comments

Linked

Related