deleting duplicate columns from csv file

Question

I've got perfmon outputting to a csv and I need to delete any repeated columns, e.g.

COL1, Col2, Col3, COL1, Col4, Col5

When columns repeat it's almost always the same column but it doesn't happen every time. What I've got so far are a couple of manual steps:

When the column count is greater than it should be I output all of the column headers on single lines:

head -n1 < output.csv|sed 's/,/\n/g'

Then, when I know which column numbers are guilty, I delete manually, e.g.:

cut -d"," --complement -f5,11 < output.csv > output2.csv

If somebody can point me in the right direction I'd be grateful!

Updated to give rough example of output.csv contents, should be familiar to anyone who's used perfmon:

"COLUMN1","Column2","Column3","COLUMN1","Column4"    
"1","1","1","1","1"  
"a","b","c","a","d"  
"x","dd","ffd","x","ef"

I need to delete the repeated COLUMN1 (4th col)

Just to be clear, I'm trying to think of a way of automatically going into output.csv and deleting repeated columns without having to tell it which columns to delete a la my manual method above. Thanks!

the input is just a standard perfmon csv log file, it's just that one of the columns gets repeated for some strange reason and I need to delete the duplicates but leave the original. I've updated with an rough example of the output... — user2000718
– user2000718, Commented Apr 6, 2013 at 19:06
What should happen to "1","1","1","1","1"? Leave just one value? Should the commas be kept or not? Your problem is quite underspecified. — Jens
– Jens, Commented Apr 6, 2013 at 20:04
Sorry, I think you might be misreading it, I'm looking to delete duplicate columns in a csv file. — user2000718
– user2000718, Commented Apr 6, 2013 at 20:24

Kent · Accepted Answer · 2013-04-06 20:06:28Z

try this awk (not really one-liner), it handles more than one duplicated columns, it checks only the title (first row) to decide which columns are duplicated. Your example shows in this way too.

awk script (one-liner version):

awk -F, 'NR==1{for(i=1;i<=NF;i++)if(!($i in v)){ v[$i];t[i]}}{s=""; for(i=1;i<=NF;i++)if(i in t)s=s sprintf("%s,",$i);if(s){sub(/,$/,"",s);print s}} ' file

clear version (same script):

awk -F, 'NR==1{
        for(i=1;i<=NF;i++)
                if(!($i in v)){v[$i];t[i]}
        }
        {s="" 
        for(i=1;i<=NF;i++)
                if(i in t)
                        s=s sprintf("%s,",$i)
                        if(s){
                                sub(/,$/,"",s)
                                print s
                        }
        } ' file

with example (note I created two duplicated cols ):

kent$  cat file
COL1,COL2,COL3,COL1,COL4,COL2
1,2,3,1,4,2
a1,a2,a3,a1,a4,a2
b1,b2,b3,b1,b4,b2
d1,d2,d3,d1,d4,d2


kent$  awk -F, 'NR==1{
        for(i=1;i<=NF;i++)
                if(!($i in v)){v[$i];t[i]}
        }
        {s="" 
        for(i=1;i<=NF;i++)
                if(i in t)
                        s=s sprintf("%s,",$i)
                        if(s){
                                sub(/,$/,"",s)
                                print s
                        }
        } ' file
COL1,COL2,COL3,COL4
1,2,3,4
a1,a2,a3,a4
b1,b2,b3,b4
d1,d2,d3,d4

Collectives™ on Stack Overflow

deleting duplicate columns from csv file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related