9

Currently I'm having a problem on the ff CSV data.

COLUMN1,COLUMN2,COLUMN3,COLUMN4
apple1,apple2,apple3,apple4
banana1,banana2,banana3,
caimito1,"caimito21
caimito22","caimito31
caimito32",caimito4

That would look like this:

╔══════════╦═══════════╦═══════════╦══════════╗
║ COLUMN1  ║  COLUMN2  ║  COLUMN3  ║ COLUMN4  ║
╠══════════╬═══════════╬═══════════╬══════════╬
║ apple1   ║ apple2    ║ apple3    ║ apple4   ║
║ banana1  ║ banana2   ║ banana3   ║          ║
║ caimito1 ║ caimito21 ║ caimito31 ║ caimito4 ║
║          ║ caimito22 ║ caimito32 ║          ║
╚══════════╩═══════════╩═══════════╩══════════╝

So my plan is to add COLUMN5 and every row of it will have a value "FRUIT".

Command used:

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/$/,FRUIT/g" > OUTPUT.csv

Output:

╔══════════╦════════════════╦════════════════╦══════════╦═════════╗
║ COLUMN1  ║  COLUMN2       ║  COLUMN3       ║ COLUMN4  ║ COLUMN5 ║
╠══════════╬════════════════╬════════════════╬══════════╬═════════╣
║ apple1   ║ apple2         ║ apple3         ║ apple4   ║ FRUIT   ║
║ banana1  ║ banana2        ║ banana3        ║          ║ FRUIT   ║
║ caimito1 ║ caimito21FRUIT ║ caimito31FRUIT ║ caimito4 ║ FRUIT   ║
║          ║ caimito22      ║ caimito32      ║          ║         ║
╚══════════╩════════════════╩════════════════╩══════════╩═════════╝

Is there any way to add the "FRUIT" without affecting "caimito" row?

I also tried the ff. command but it didn't work out. Added "," before "$".

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/,$/,FRUIT/g" > OUTPUT.csv
1
  • the 2 sed piped could be replaced by 1 with action separated by ; like this sed '1 action1; 2,$ action2' yourfile Commented Feb 12, 2015 at 12:01

3 Answers 3

2

Sed is probably not the right tool to handle csv files since the rules are way more complicated than how it may look (it's probably possible, but such scripts are in general rather error-prone, etc.). You can however use csvtools to handle this:

file="FILE.csv"
nr=$(csvtool height $file)
ot=$(perl -e "print \"COLUMN5\\n\";for\$i(2..$nr){print \"FRUIT\\n\";}")
echo "$ot" | csvtool paste "$file" -

The script works as follows:

  1. First we calculate the number of rows with csvtool height,
  2. Next we generate the additional column by printing COLUMN5 followed by n-1 times FRUIT.
  3. Finally we paste that content to the right of the file.
Sign up to request clarification or add additional context in comments.

Comments

2

EDIT: I only just saw the csvtool solution; it is of course much more practical. I leave this solution up mainly because it would be a pity to hide it and its Lovecraftian beauty.

Well, here goes. This is a way to do it in sed:

sed ':a $!{ N; ba }; s/"[^"]*"/{&}/g; :b s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g; tb; s/\n\|$/,FRUIT&/g; s/,FRUIT\(\n\|$\)/,COLUMN5\1/; :c s/\({"[^"]\)*~"~/\1\n/g; tc; s/{"\|"}/"/g' filename

This is going to be a bit of a ride. It works as follows:

:a $!{ N; ba }                         # assemble the whole file in the
                                       # hold buffer

s/"[^"]*"/{&}/g                        # encase all "-enclosed fields in
                                       # {"..."} to make matching the beginning
                                       # and end separately possible.

:b                                     # jump mark for looping
s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g   # replace the first newline in all
                                       # {"..."} fields with ~"~
tb                                     # loop until all were replaced

s/\n\|$/,FRUIT&/g                      # Put FRUIT at the end of all lines
s/,FRUIT\(\n\|$\)/,COLUMN5\1/          # Replace the first ,FRUIT with ,COLUMN5
                                       # The \(\n\|$\) bit is so that this
                                       # works with empty files (that only
                                       # have a header line)

:c                                     # Jump mark for looping
s/\({"[^"]\)*~"~/\1\n/g                # replace the first ~"~ in all {"..."}
                                       # fields with a newline
tc                                     # loop until all were replaced

s/{"\|"}/"/g                           # replace all {", "} markers with "
                                       # again.

Comments

1
sed '1 {
   s/$/,COLUMN5/
   b
   }
:load
/^\([^"]*"[^"]*"\)*[^"]*"[^"]*$/ {
   N
   b load
   }
s/$/,,,,/;s/^\(\([^,]*,\)\{4\}\).*/\1FRUIT/' YourFile
  • add COLUMN5 on 1st line than cycle (b)
  • if a open " is on the current working buffer, load a new line and retry this
  • add 4 , by default
  • keep the 4 first group separate by a , and add FRUIT
  • (cycle)

posix version so --posix on GNU sed

for a "valid" csv (1 line with all argument separate by ,), just remove the load cycle section

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.