1

I have a data set that looks something like this:

Input 

Cat   2 1 aa
Dog   1 0 aa 
Dog   1 2 aa
Cat   2 7 aa
Mouse 0 0 aa
Cat   1 5 
Dog   4 3
.     . .
.     . .
.     . .
Cat   1 5 
Dog   4 3
Cat   6 9 bb
Dog   3 1 bb 
Dog   3 6 bb
Cat   6 4 bb
Mouse 0 0 bb

With this dataset I want to do the following:

  • If column 4 is blank, print the line.
  • If Column 4 is not blank, print only the first occurrence of the record with each combination of column 1 and column 4.

    Output
    Cat 2 1 aa
    Dog 1 0 aa
    Mouse 0 0 aa
    Cat 1 5
    Dog 4 3
    . . .
    . . .
    . . .
    Cat 1 5
    Dog 4 3
    Cat 6 4 bb
    Dog 3 1 bb
    Mouse 0 0 bb

Note that here: "Cat 2 1 aa" is the first record with column 1=cat and column 4=aa, so it is printed. "cat 1 5 aa" is not printed since we already have a record with column 1=cat and column 4=aa.

1
  • Try a combination of sort + uniq + awk ..... Commented Nov 19, 2015 at 5:35

1 Answer 1

1

Using :

awk '$4 == "" || !a[$1,$4]++' input

Results:

Cat   2 1 aa
Dog   1 0 aa 
Mouse 0 0 aa
Cat   1 5 
Dog   4 3
.     . .
.     . .
.     . .
Cat   1 5 
Dog   4 3
Cat   6 9 bb
Dog   3 1 bb 
Mouse 0 0 bb
Sign up to request clarification or add additional context in comments.

1 Comment

You made that a very easy! Nice one! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.