2

I have a file in Unix like the follows

">hello"
"hello"
"newuser"
"<newuser"
"newone"

Now I want to find unique occurrences in the file (exluding the < or > only while searching) and the output as:

">hello"
"<newuser"
"newone"
3
  • Have you typed man uniq at your terminal? Commented Jun 11, 2013 at 6:47
  • 1
    uniq will do much, but not all of this. You could ignore the > and < by removing them with sed and piping through uniq, but then the > < won't appear in the output. Commented Jun 11, 2013 at 7:04
  • 1
    You can also use an associative array in a language like perl or python to keep a cache of the strings seen so far. This cache can be used to decide when new lines are unique. Commented Jun 11, 2013 at 7:11

3 Answers 3

3
#!/usr/bin/env python

import sys
seen = set()
for line in sys.stdin:
    word = line.strip().replace('>', '').replace('<', '')
    if word not in seen:
        seen.add(word)
        sys.stdout.write(line)

$ ./uniq.py < file1
">hello"
"newuser"
"newone"
Sign up to request clarification or add additional context in comments.

Comments

2
$ awk '{ w = $1; sub(/[<>]/, "", w) } word[w] == 0 { word[w]++; print $1 }' file1
">hello"
"newuser"
"newone"

Comments

0

Here's that associative array idea in Ruby.

2.0.0p195 :005 > entries= [">hello", "hello", "newuser", "<newuser", "newone"]
 => [">hello", "hello", "newuser", "<newuser", "newone"] 
2.0.0p195 :006 > entries.reduce({}) { |hash, entry| hash[entry.sub(/[<>]/,'')]=entry; hash}.values
 => ["hello", "<newuser", "newone"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.