linux sed command - adding string on every end of the line of csv

Question

Currently I'm having a problem on the ff CSV data.

COLUMN1,COLUMN2,COLUMN3,COLUMN4
apple1,apple2,apple3,apple4
banana1,banana2,banana3,
caimito1,"caimito21
caimito22","caimito31
caimito32",caimito4

That would look like this:

╔══════════╦═══════════╦═══════════╦══════════╗
║ COLUMN1  ║  COLUMN2  ║  COLUMN3  ║ COLUMN4  ║
╠══════════╬═══════════╬═══════════╬══════════╬
║ apple1   ║ apple2    ║ apple3    ║ apple4   ║
║ banana1  ║ banana2   ║ banana3   ║          ║
║ caimito1 ║ caimito21 ║ caimito31 ║ caimito4 ║
║          ║ caimito22 ║ caimito32 ║          ║
╚══════════╩═══════════╩═══════════╩══════════╝

So my plan is to add COLUMN5 and every row of it will have a value "FRUIT".

Command used:

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/$/,FRUIT/g" > OUTPUT.csv

Output:

╔══════════╦════════════════╦════════════════╦══════════╦═════════╗
║ COLUMN1  ║  COLUMN2       ║  COLUMN3       ║ COLUMN4  ║ COLUMN5 ║
╠══════════╬════════════════╬════════════════╬══════════╬═════════╣
║ apple1   ║ apple2         ║ apple3         ║ apple4   ║ FRUIT   ║
║ banana1  ║ banana2        ║ banana3        ║          ║ FRUIT   ║
║ caimito1 ║ caimito21FRUIT ║ caimito31FRUIT ║ caimito4 ║ FRUIT   ║
║          ║ caimito22      ║ caimito32      ║          ║         ║
╚══════════╩════════════════╩════════════════╩══════════╩═════════╝

Is there any way to add the "FRUIT" without affecting "caimito" row?

I also tried the ff. command but it didn't work out. Added "," before "$".

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/,$/,FRUIT/g" > OUTPUT.csv

the 2 sed piped could be replaced by 1 with action separated by ; like this sed '1 action1; 2,$ action2' yourfile — NeronLeVelu
– NeronLeVelu, Commented Feb 12, 2015 at 12:01

willeM_ Van Onsem · Accepted Answer · 2015-03-17 12:45:37Z

2

Sed is probably not the right tool to handle csv files since the rules are way more complicated than how it may look (it's probably possible, but such scripts are in general rather error-prone, etc.). You can however use csvtools to handle this:

file="FILE.csv"
nr=$(csvtool height $file)
ot=$(perl -e "print \"COLUMN5\\n\";for\$i(2..$nr){print \"FRUIT\\n\";}")
echo "$ot" | csvtool paste "$file" -

The script works as follows:

First we calculate the number of rows with csvtool height,
Next we generate the additional column by printing COLUMN5 followed by n-1 times FRUIT.
Finally we paste that content to the right of the file.

edited Mar 17, 2015 at 12:45

answered Feb 12, 2015 at 9:55

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wintermute · Accepted Answer · 2015-02-12 10:44:57Z

EDIT: I only just saw the csvtool solution; it is of course much more practical. I leave this solution up mainly because it would be a pity to hide it and its Lovecraftian beauty.

Well, here goes. This is a way to do it in sed:

sed ':a $!{ N; ba }; s/"[^"]*"/{&}/g; :b s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g; tb; s/\n\|$/,FRUIT&/g; s/,FRUIT\(\n\|$\)/,COLUMN5\1/; :c s/\({"[^"]\)*~"~/\1\n/g; tc; s/{"\|"}/"/g' filename

This is going to be a bit of a ride. It works as follows:

:a $!{ N; ba }                         # assemble the whole file in the
                                       # hold buffer

s/"[^"]*"/{&}/g                        # encase all "-enclosed fields in
                                       # {"..."} to make matching the beginning
                                       # and end separately possible.

:b                                     # jump mark for looping
s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g   # replace the first newline in all
                                       # {"..."} fields with ~"~
tb                                     # loop until all were replaced

s/\n\|$/,FRUIT&/g                      # Put FRUIT at the end of all lines
s/,FRUIT\(\n\|$\)/,COLUMN5\1/          # Replace the first ,FRUIT with ,COLUMN5
                                       # The \(\n\|$\) bit is so that this
                                       # works with empty files (that only
                                       # have a header line)

:c                                     # Jump mark for looping
s/\({"[^"]\)*~"~/\1\n/g                # replace the first ~"~ in all {"..."}
                                       # fields with a newline
tc                                     # loop until all were replaced

s/{"\|"}/"/g                           # replace all {", "} markers with "
                                       # again.

NeronLeVelu · Accepted Answer · 2015-02-12 11:57:44Z

1

sed '1 {
   s/$/,COLUMN5/
   b
   }
:load
/^\([^"]*"[^"]*"\)*[^"]*"[^"]*$/ {
   N
   b load
   }
s/$/,,,,/;s/^\(\([^,]*,\)\{4\}\).*/\1FRUIT/' YourFile

add COLUMN5 on 1st line than cycle (b)
if a open " is on the current working buffer, load a new line and retry this
add 4 , by default
keep the 4 first group separate by a , and add FRUIT
(cycle)

posix version so --posix on GNU sed

for a "valid" csv (1 line with all argument separate by ,), just remove the load cycle section

edited Feb 12, 2015 at 11:57

answered Feb 12, 2015 at 11:44

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

Collectives™ on Stack Overflow

linux sed command - adding string on every end of the line of csv

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related