0

I have a CSV file with the following records:

DATE,TAG,ID,METRIC_1,METRIC_2,METRIC_3,METRIC_4,METRIC_5,METRIC_6,METRIC_7,METRIC_8,METRIC_9,METRIC_A,METRIC_B,METRIC_C,METRIC_D,METRIC_E,METRIC_F,METRIC_G
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI3,37683,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI4,37684,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI7,37687,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI8,37688,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI9,37689,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ0,37690,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

The goal is to get only the rows that have values greater than zero using AWK command:

2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

What I tried to do

awk -v FS=, 'NR!=1 {for(i=4; i<NF; i++) if($i>0)print$0;next}' file.csv

The output:

2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

I know it is failing because it is iterating through each column checking the condition and printing the output with each column that meets the condition therefore the duplicate records.

How can this be corrected to print the current line that matches the condition once and skip to the next line ?


EDIT: here is the above code formatted legibly by gawk -o-:

NR != 1 {
        for (i = 4; i < NF; i++) {
                if ($i > 0) {
                        print $0
                }
        }
        next
}
0

4 Answers 4

4

Firstly observe that

NR!=1 {for(i=4; i<NF; i++) if($i>0)print$0;next}

means that next is outside for loop body, so it is executed after loop is completely done and as you have only that pattern-action pair, it does just act as no-operation. Add {...} to inform GNU AWK what you actually wants, that is replace above part using

NR!=1 {for(i=4; i<NF; i++){if($i>0){print$0;next}}}

then for

DATE,TAG,ID,METRIC_1,METRIC_2,METRIC_3,METRIC_4,METRIC_5,METRIC_6,METRIC_7,METRIC_8,METRIC_9,METRIC_A,METRIC_B,METRIC_C,METRIC_D,METRIC_E,METRIC_F,METRIC_G
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI3,37683,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI4,37684,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI7,37687,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI8,37688,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI9,37689,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ0,37690,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

you will get output

2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

Also be warned that your code ignores last field, if this is feature compliant with requirements left it as it, if this is bug, use i<=NF as check.

(tested in gawk 4.2.1)

Sign up to request clarification or add additional context in comments.

Comments

2
$ awk -F, 'NR>1{for(i=4;i<=NF;i++) if($i>0) {print; next}}' file.csv

1 Comment

posted this as a canonical solution. Please see @Daweo's answer with explanations.
1

compared to checking fields one at a time, it's less hassle to simply save $0, use regex to high-speed scan the input line, and only restore it when positive values have been located

{m,g}awk 'BEGIN { _^= FS = OFS = "," (__="") } substr(__, 

(___=$(_=__)) * ($++_=$++_=$++_=__), gsub(",(-[^,]+|[+-]?0([.]0*)?)",
               FS))^!_ == NR || /^[,]*$/ ? NF = __ : ($!NF = ___)^__'
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

Comments

1

You already have the answer to what was wrong with your script but consider this alternative to looping through all of your fields:

$ awk '/^([^,]+,){3}.*[^0.,]/' file
DATE,TAG,ID,METRIC_1,METRIC_2,METRIC_3,METRIC_4,METRIC_5,METRIC_6,METRIC_7,METRIC_8,METRIC_9,METRIC_A,METRIC_B,METRIC_C,METRIC_D,METRIC_E,METRIC_F,METRIC_G
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

Just add NR>1 && to the start of the condition if you really don't want to print the header line:

$ awk 'NR>1 && /^([^,]+,){3}.*[^0.,]/' file
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.