I have a CSV file with the following records:
DATE,TAG,ID,METRIC_1,METRIC_2,METRIC_3,METRIC_4,METRIC_5,METRIC_6,METRIC_7,METRIC_8,METRIC_9,METRIC_A,METRIC_B,METRIC_C,METRIC_D,METRIC_E,METRIC_F,METRIC_G
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI3,37683,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI4,37684,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI7,37687,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI8,37688,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI9,37689,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ0,37690,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
The goal is to get only the rows that have values greater than zero using AWK command:
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
What I tried to do
awk -v FS=, 'NR!=1 {for(i=4; i<NF; i++) if($i>0)print$0;next}' file.csv
The output:
2000-01-29,3PXI1,37681,1.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI2,37682,20.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,0,0.00,0.00,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI5,37685,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,22.37,23.91,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXI6,37686,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,30.00,40.14,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
2000-01-29,3PXJ1,37691,0.00,0.00,0.00,0.00,0.00,0,0.00,0.00,0.00,1,25.00,51.13,0.00,0.00,0.00,0.00
I know it is failing because it is iterating through each column checking the condition and printing the output with each column that meets the condition therefore the duplicate records.
How can this be corrected to print the current line that matches the condition once and skip to the next line ?
EDIT: here is the above code formatted legibly by gawk -o-:
NR != 1 {
for (i = 4; i < NF; i++) {
if ($i > 0) {
print $0
}
}
next
}