I don't want to sort my file, just filter out duplicate lines, maintaining the original ordering. Is there a way to use sort's unique function without it's sort function (something like cat -u would give if it existed)? Just using uniq without sort does nothing worthwhile, because uniq only looks at adjacent lines, so a file has to be sorted first.
Also, incidentally, what in hell is the difference between uniq and uniq --unique? Here are commands on a random file from pastebin:
wget -qO - http://pastebin.com/0cSPs9LR | wc -l
350
wget -qO - http://pastebin.com/0cSPs9LR | sort -u | wc -l
287
wget -qO - http://pastebin.com/0cSPs9LR | sort | uniq | wc -l
287
wget -qO - http://pastebin.com/0cSPs9LR | sort | uniq -u | wc -l
258
In summary:
- How do I filter duplicates greedily without sorting?
- How is
uniqnot unique enough that there is alsouniq --unique?
p.s. This question looks like a duplicate of the following q's, but it isn't:
sortoruniqat all. And "How is uniq not unique enough that there is also uniq --unique?" really should be a separate question.