3

I have a csv separated with ;. I need to remove lines where content of 2nd and 3rd column is not unique, and deliver the material to the standard output.

Example input:

irrelevant;data1;data2;irrelevant;irrelevant  
irrelevant;data3;data4;irrelevant;irrelevant  
irrelevant;data5;data6;irrelevant;irrelevant  
irrelevant;data7;data8;irrelevant;irrelevant  
irrelevant;data1;data2;irrelevant;irrelevant  
irrelevant;data9;data0;irrelevant;irrelevant  
irrelevant;data1;data2;irrelevant;irrelevant  
irrelevant;data3;data4;irrelevant;irrelevant  

Desired output

irrelevant;data5;data6;irrelevant;irrelevant  
irrelevant;data7;data8;irrelevant;irrelevant  
irrelevant;data9;data0;irrelevant;irrelevant  

I have found solutions where only first line is printed to the output:

sort -u -t ";" -k2,1 file  

but this is not enough.

I have tried to use uniq -u but I can't find a way to check only a few columns.

4
  • in all the lines there isn't an unique value in the 2nd and 3rd columns. Commented Aug 22, 2014 at 15:28
  • I agree with @jaypal, that question is about finding unique records only. Commented Aug 22, 2014 at 15:32
  • @AvinashRaj: OP wants to list those records where col2, col3 appear only once in whole file. Commented Aug 22, 2014 at 15:38
  • 1
    Yes, @anubhava is right. Storing the material in some temporary template seems to be the only way. It seems both awk and perl solutions are very similar. Commented Aug 23, 2014 at 0:19

3 Answers 3

5

Using awk:

awk -F';' '!seen[$2,$3]++{data[$2,$3]=$0}
      END{for (i in seen) if (seen[i]==1) print data[i]}' file
irrelevant;data5;data6;irrelevant;irrelevant
irrelevant;data7;data8;irrelevant;irrelevant
irrelevant;data9;data0;irrelevant;irrelevant

Explanation: If $2,$3 combination doesn't exist in seen array then a new entry with key of $2,$3 is stored in data array with whole record. Every time $2,$3 entry is found a counter for $2,$3 is incremented. Then in the end those entries with counter==1 are printed.

Sign up to request clarification or add additional context in comments.

Comments

-1

If order is important and if you can use perl then:

perl -F";" -lane '
    $key = @F[1,2]; 
    $uniq{$key}++ or push @rec, [$key, $_] 
}{ 
    print $_->[1] for grep { $uniq{$_->[0]} == 1 } @rec' file
irrelevant;data5;data6;irrelevant;irrelevant  
irrelevant;data7;data8;irrelevant;irrelevant  
irrelevant;data9;data0;irrelevant;irrelevant  

We use column2 and column3 to create composite key. We create array of array by pushing the key and the line to array rec for the first occurrence of the line.

In the END block, we check if that occurrence is the only occurrence. If so, we go ahead and print the line.

Comments

-1
awk '!a[$0]++' file_input > file_output

This worked for me. It compares whole lines.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.