How to remove a specific string common in multiple lines in a CSV file using shell script?

Question

I have a csv file which contains 65000 lines (Size approximately 28 MB). In each of the lines a certain path in the beginning is given e.g. "c:\abc\bcd\def\123\456". Now let's say the path "c:\abc\bcd\" is common in all the lines and rest of the content is different. I have to remove the common part (In this case "c:\abc\bcd\") from all the lines using a shell script. For example the content of the CSV file is as mentioned.

C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.frag                   0   0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.vert                   0   0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.frag       16  24  3
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert       87  116 69
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert.bin   75  95  61
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0            0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-6            0   0   0

In the above example I need the output as below

FILE0.frag                  0   0   0
FILE0.vert                  0   0   0
FILE0.link-link-0.frag      17  25  2
FILE0.link-link-0.vert      85  111 68
FILE0.link-link-0.vert.bin  77  97  60
FILE0.link-link-0               0   0
FILE0.link                  0   0   0

Can any of you please help me out with this?

Can you please edit the question to include a few lines of example input and expected output? Is the common substring known in advance or should it be calculated from the input? — Wintermute
– Wintermute, Commented Apr 15, 2015 at 8:18
Without doing as @Wintermute suggests you are going to end up with an answer that may produce the output you want for some specific input set but is an absolutely ridiculous way to get it and probably won't work for all possible inputs. — Ed Morton
– Ed Morton, Commented Apr 15, 2015 at 16:35

chw21 · Accepted Answer · 2015-04-15 09:22:07Z

1

You could use sed:

$ cat test.csv 
"c:\abc\bcd\def\123\456", 1, 2
"c:\abc\bcd\def\234\456", 1, 2
"c:\abc\bcd\def\432\456", 3, 4

$ sed -i.bak -e 's/c\:\\abc\\bcd\\//1' test.csv

$ cat test.csv
"def\123\456", 1, 2
"def\234\456", 1, 2
"def\432\456", 3, 4

I am using sed here in this way:

sed -e 's/<SEARCH TERM>/<REPLACE_TERM>/<OCCURANCE>' FILE

where

<SEARCH TERM> is what we are looking for (in this case c:\abc\bcd\, but backslashes need to be escaped).
<REPLACE TERM> is what we want to replace it with, in this case nothing, and
<OCCURANCE> is which occurance of the item we want to replace, in this case the first item in each line.

(-i.bak stands for: Don't output, just edit this file. (but make a backup first))

Updated according to @david-c-rankin comment. He is right, make a backup before editing files in case you make a mistake.

edited Apr 15, 2015 at 9:22

answered Apr 15, 2015 at 8:39

chw21

8,1601 gold badge21 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David C. Rankin Over a year ago

sed -i.bak ... filename stands for edit filename in place, but if I screw it up, please make a backup for me in filename.bak. It's always better to delete .bak files later...

NeronLeVelu · Accepted Answer · 2015-04-15 12:45:20Z

0

# init variable
MaxPath="$( sed -n 's/,.*//p;1q' YourFile )"
GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"

# search the biggest pattern to remove
while [ ${#MaxPath} -gt 0 ] && [ $( grep -c -v -E "${GrepPath}" YourFile ) -gt 0 ]
 do
   MaxPath="${MaxPath%%?}"
   GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"
 done

# Adapt your file
if [ ${#MaxPath} -gt 0 ]
 then
   sed "s#${GrepPath}##" YourFile
 fi

Assuming for the sample that there is no special regex char nor # in MaxPath
the grep -c -v -E is not optimized in term of performance (treat whle file each time where it can stop at first miss)

answered Apr 15, 2015 at 12:45

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

Collectives™ on Stack Overflow

How to remove a specific string common in multiple lines in a CSV file using shell script?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related