Adding commas when necessary to a csv file using regex

Question

I have a csv file like the following:

entity_name,data_field_name,type
Unit,id
Track,id,LONG

The second row is missing a comma. I wonder if there might be some regex or awk like tool in order to append commas to the end of line in case there are missing commas in these rows?

Update

I know the requirements are a little vague. There might be several alternative ways to narrow down the requirements such as:

The header row should define the number of columns (and commas) that is valid for the whole file. The script should read the header row first and find out the correct number of columns.
The number of columns might be passed as an argument to the script.
The number of columns can be hardcoded into the script.

I didn't narrow down the requirements at first because I was ok with any of them. Of course, the first alternative is the best but I wasn't sure if this was easy to implement or not.

Thanks for all the great answers and comments. Next time, I will state acceptable alternative requirements explicitly.

you tagged Q with vim, you don't know how to add the comma on line 2 in vim? or there are still more requirements? — Kent
– Kent, Commented Apr 27, 2016 at 8:54
Edit your question to show your attempt and expected output. Can there by 2 missing commas? blank lines? If so update your input to include those cases. — Ed Morton
– Ed Morton, Commented Apr 27, 2016 at 14:09
@Kent In the question, I said "there might be some regex or awk like tool". It is very common in this kind of problems that someone suggests a very different solution with some other tool. Vim is a very powerful tool. I thought maybe there might come out an interesting solution using Vim. — Mert Nuhoglu
– Mert Nuhoglu, Commented Apr 27, 2016 at 15:10

anubhava · Accepted Answer · 2016-04-27 14:20:28Z

3

You can use this awk command to fill up all rows starting from 2nd row with the empty cell values based on # of columns in the header row, in order to avoid hard-coding # of columns:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} NF{$nc=$nc} 1' file

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Earlier solution:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} {printf "%s", $0;
  for (i=NF+1; i<=nc; i++) printf "%s", OFS; print ""}' file

edited Apr 27, 2016 at 14:20

answered Apr 27, 2016 at 7:01

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

123 Over a year ago

awk -F, -vOFS=, 'NR==1{x=NF}NF=x' does the same.

anubhava Over a year ago

Actually I did that once but got a word of caution from EdMorton that this feature of changing NF is not portable across all the awk versions.

anubhava Over a year ago

I currently have gnu awk, but will try on BSD awk in few hours

anubhava Over a year ago

@123: Just tested on OSX's awk and awk -F, -v OFS=, 'NR==1{x=NF}NF=x' file didn't work. 2nd row prints as Unit,id instead of Unit,id,

123 Over a year ago

Nice, thanks for the follow up. I think awk 'BEGIN{FS=OFS=","}{$3=$3}1 should also work then ?

|

Avinash Raj · Accepted Answer · 2016-04-27 06:49:40Z

1

I would use sed,

sed 's/^[^,]*,[^,]*$/&,/' file

Example:

$ echo 'Unit,id' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,
$ echo 'Unit,id,bar' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,bar

answered Apr 27, 2016 at 6:49

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Comments

Ren · Accepted Answer · 2016-04-27 06:51:49Z

1

Try this:

$ awk -F , 'NF==2{$2=$2","}1' file

Output:

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

answered Apr 27, 2016 at 6:51

Ren

2,9563 gold badges28 silver badges50 bronze badges

Comments

sat · Accepted Answer · 2016-04-27 06:54:59Z

1

With another awk:

awk -F, 'NF==2{$3=""}1' OFS=, yourfile.csv

answered Apr 27, 2016 at 6:54

sat

15k7 gold badges49 silver badges69 bronze badges

Comments

Lieven Keersmaekers · Accepted Answer · 2016-04-27 10:24:06Z

1

to present balance to all the awk solutions, following could be a vim only solution

:v/,.*,/norm A,

rationale

/,.*,/          searches for 2 comma's in a line
:v              apply a global command on each line NOT matching the search
norm A,         enters normal mode and appends a , to the end of the line

answered Apr 27, 2016 at 10:24

Lieven Keersmaekers

58.9k15 gold badges117 silver badges151 bronze badges

Comments

Ed Morton · Accepted Answer · 2016-04-27 14:15:27Z

1

This MIGHT be all you need, depending on the info you haven't shared with us in your question:

$ awk -F, '{print $0 (NF<3?FS:"")}' file
entity_name,data_field_name,type
Unit,id,
Track,id,LONG

answered Apr 27, 2016 at 14:15

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Collectives™ on Stack Overflow

Adding commas when necessary to a csv file using regex

Update

6 Answers 6

10 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update

6 Answers 6

10 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related