1

I've got perfmon outputting to a csv and I need to delete any repeated columns, e.g.

COL1, Col2, Col3, COL1, Col4, Col5

When columns repeat it's almost always the same column but it doesn't happen every time. What I've got so far are a couple of manual steps:

When the column count is greater than it should be I output all of the column headers on single lines:

head -n1 < output.csv|sed 's/,/\n/g'

Then, when I know which column numbers are guilty, I delete manually, e.g.:

cut -d"," --complement -f5,11 < output.csv > output2.csv

If somebody can point me in the right direction I'd be grateful!

Updated to give rough example of output.csv contents, should be familiar to anyone who's used perfmon:

"COLUMN1","Column2","Column3","COLUMN1","Column4"    
"1","1","1","1","1"  
"a","b","c","a","d"  
"x","dd","ffd","x","ef"  

I need to delete the repeated COLUMN1 (4th col)

Just to be clear, I'm trying to think of a way of automatically going into output.csv and deleting repeated columns without having to tell it which columns to delete a la my manual method above. Thanks!

3
  • the input is just a standard perfmon csv log file, it's just that one of the columns gets repeated for some strange reason and I need to delete the duplicates but leave the original. I've updated with an rough example of the output... Commented Apr 6, 2013 at 19:06
  • What should happen to "1","1","1","1","1"? Leave just one value? Should the commas be kept or not? Your problem is quite underspecified. Commented Apr 6, 2013 at 20:04
  • Sorry, I think you might be misreading it, I'm looking to delete duplicate columns in a csv file. Commented Apr 6, 2013 at 20:24

1 Answer 1

3

try this awk (not really one-liner), it handles more than one duplicated columns, it checks only the title (first row) to decide which columns are duplicated. Your example shows in this way too.

awk script (one-liner version):

awk -F, 'NR==1{for(i=1;i<=NF;i++)if(!($i in v)){ v[$i];t[i]}}{s=""; for(i=1;i<=NF;i++)if(i in t)s=s sprintf("%s,",$i);if(s){sub(/,$/,"",s);print s}} ' file

clear version (same script):

awk -F, 'NR==1{
        for(i=1;i<=NF;i++)
                if(!($i in v)){v[$i];t[i]}
        }
        {s="" 
        for(i=1;i<=NF;i++)
                if(i in t)
                        s=s sprintf("%s,",$i)
                        if(s){
                                sub(/,$/,"",s)
                                print s
                        }
        } ' file

with example (note I created two duplicated cols ):

kent$  cat file
COL1,COL2,COL3,COL1,COL4,COL2
1,2,3,1,4,2
a1,a2,a3,a1,a4,a2
b1,b2,b3,b1,b4,b2
d1,d2,d3,d1,d4,d2


kent$  awk -F, 'NR==1{
        for(i=1;i<=NF;i++)
                if(!($i in v)){v[$i];t[i]}
        }
        {s="" 
        for(i=1;i<=NF;i++)
                if(i in t)
                        s=s sprintf("%s,",$i)
                        if(s){
                                sub(/,$/,"",s)
                                print s
                        }
        } ' file
COL1,COL2,COL3,COL4
1,2,3,4
a1,a2,a3,a4
b1,b2,b3,b4
d1,d2,d3,d4
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.