0

CSV data format

1st Format

name,email,mobile,email
a,[email protected],1234567890,[email protected]

2nd Format
name,email,"mobile,number",email
a,[email protected],1234567890,[email protected]

3rd Format
name,email,"mobile number",email
a,[email protected],1234567890,[email protected]

In my above data format email is duplicate in header so I only want to keep first column value for email and second email data with header should be deleted from file.

I have tied this but it;s not working properly

awk  -F'","' 'NR==1{for(i=1;i<=NF;i++)if(!($i in v)){ v[$i];t[i]}}{s=""; for(i=1;i<=NF;i++)if(i in t)s=s sprintf("%s,",$i);if(s){sub(/,$/,"",s);print s}} ' input.csv > output.csv

Please suggest script command for the same

8
  • 2
    Kindly do add your efforts in your question which is highly encouraged on SO. You could also use search functionality of SO to look for answers(not my downvote btw). Commented Nov 17, 2020 at 19:08
  • 1
    @RavinderSingh13 I have updated my question with my effort Commented Nov 17, 2020 at 19:11
  • Since efforts are added I have voted to reopen this question, but your samples are not clear so kindly do add more clear samples in your question and let us know then. Commented Nov 17, 2020 at 19:11
  • Handling CSV with AWK is more complex than -F",", because a value could contain a comma. Commented Nov 17, 2020 at 19:11
  • @rethab yes value can have comma Commented Nov 17, 2020 at 19:12

2 Answers 2

0

Awk is probably not the most practical tool for manipulating CSV files. There are many others.

Here are a few examples, with your data

  • csvtool (sudo apt install csvtool)
$ for f in [123].csv; do echo $f; csvtool col 1-3 "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,"mobile,number"
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
  • csvcut (sudo apt install csvkit)
$ for f in [123].csv; do echo $f; csvcut -C 4 "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,"mobile,number"
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
  • Perl's Text::CSV (sudo apt install libtext-csv-perl)
    (This would probably be better suited for more complex needs, and should be in a more readable script file)
$ for f in [123].csv; do echo $f; perl -MText::CSV -lne 'BEGIN{$csv=Text::CSV_XS->new()} if ($csv->parse($_)) {$csv->print(*STDOUT, [ ($csv->fields)[0..2] ]);}' "$f"; echo; done
1.csv
name,email,mobile
a,[email protected],1234567890

2.csv
name,email,mobile,number
a,[email protected],1234567890

3.csv
name,email,mobile number
a,[email protected],1234567890
Sign up to request clarification or add additional context in comments.

Comments

0

If your CSV is well-formed, try

sed 's/^\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\)$/\1,\3,\5/'

Demo: https://ideone.com/7xKlGU

The regex isn't particularly elegant but should work straightforwardly. "\([^"]\|""\)*" matches a quoted field and [^",]* matches a field which isn't quoted. This assumes that either a field in its entirety is quoted, or not at all, and that the escaping mechanism is doubling the double quotes which should be literal, as is the convention in most common CSV dialects.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.