2

I have a csv file like the following:

entity_name,data_field_name,type
Unit,id
Track,id,LONG

The second row is missing a comma. I wonder if there might be some regex or awk like tool in order to append commas to the end of line in case there are missing commas in these rows?

Update

I know the requirements are a little vague. There might be several alternative ways to narrow down the requirements such as:

  1. The header row should define the number of columns (and commas) that is valid for the whole file. The script should read the header row first and find out the correct number of columns.
  2. The number of columns might be passed as an argument to the script.
  3. The number of columns can be hardcoded into the script.

I didn't narrow down the requirements at first because I was ok with any of them. Of course, the first alternative is the best but I wasn't sure if this was easy to implement or not.

Thanks for all the great answers and comments. Next time, I will state acceptable alternative requirements explicitly.

4
  • 1
    you tagged Q with vim, you don't know how to add the comma on line 2 in vim? or there are still more requirements? Commented Apr 27, 2016 at 8:54
  • Could be as simple as v/,.*,/norm A, Commented Apr 27, 2016 at 10:23
  • 1
    Edit your question to show your attempt and expected output. Can there by 2 missing commas? blank lines? If so update your input to include those cases. Commented Apr 27, 2016 at 14:09
  • @Kent In the question, I said "there might be some regex or awk like tool". It is very common in this kind of problems that someone suggests a very different solution with some other tool. Vim is a very powerful tool. I thought maybe there might come out an interesting solution using Vim. Commented Apr 27, 2016 at 15:10

6 Answers 6

3

You can use this awk command to fill up all rows starting from 2nd row with the empty cell values based on # of columns in the header row, in order to avoid hard-coding # of columns:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} NF{$nc=$nc} 1' file

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Earlier solution:

awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} {printf "%s", $0;
  for (i=NF+1; i<=nc; i++) printf "%s", OFS; print ""}' file
Sign up to request clarification or add additional context in comments.

10 Comments

awk -F, -vOFS=, 'NR==1{x=NF}NF=x' does the same.
Actually I did that once but got a word of caution from EdMorton that this feature of changing NF is not portable across all the awk versions.
I currently have gnu awk, but will try on BSD awk in few hours
@123: Just tested on OSX's awk and awk -F, -v OFS=, 'NR==1{x=NF}NF=x' file didn't work. 2nd row prints as Unit,id instead of Unit,id,
Nice, thanks for the follow up. I think awk 'BEGIN{FS=OFS=","}{$3=$3}1 should also work then ?
|
1

I would use sed,

sed 's/^[^,]*,[^,]*$/&,/' file

Example:

$ echo 'Unit,id' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,
$ echo 'Unit,id,bar' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,bar

Comments

1

Try this:

$ awk -F , 'NF==2{$2=$2","}1' file

Output:

entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Comments

1

With another awk:

awk -F, 'NF==2{$3=""}1' OFS=, yourfile.csv

Comments

1

to present balance to all the awk solutions, following could be a vim only solution

:v/,.*,/norm A,

rationale

/,.*,/          searches for 2 comma's in a line
:v              apply a global command on each line NOT matching the search
norm A,         enters normal mode and appends a , to the end of the line        

Comments

1

This MIGHT be all you need, depending on the info you haven't shared with us in your question:

$ awk -F, '{print $0 (NF<3?FS:"")}' file
entity_name,data_field_name,type
Unit,id,
Track,id,LONG

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.