0

I have a csv file which contains 65000 lines (Size approximately 28 MB). In each of the lines a certain path in the beginning is given e.g. "c:\abc\bcd\def\123\456". Now let's say the path "c:\abc\bcd\" is common in all the lines and rest of the content is different. I have to remove the common part (In this case "c:\abc\bcd\") from all the lines using a shell script. For example the content of the CSV file is as mentioned.

C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.frag                   0   0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.vert                   0   0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.frag       16  24  3
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert       87  116 69
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert.bin   75  95  61
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0            0   0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-6            0   0   0 

In the above example I need the output as below

FILE0.frag                  0   0   0
FILE0.vert                  0   0   0
FILE0.link-link-0.frag      17  25  2
FILE0.link-link-0.vert      85  111 68
FILE0.link-link-0.vert.bin  77  97  60
FILE0.link-link-0               0   0
FILE0.link                  0   0   0

Can any of you please help me out with this?

2
  • 4
    Can you please edit the question to include a few lines of example input and expected output? Is the common substring known in advance or should it be calculated from the input? Commented Apr 15, 2015 at 8:18
  • Without doing as @Wintermute suggests you are going to end up with an answer that may produce the output you want for some specific input set but is an absolutely ridiculous way to get it and probably won't work for all possible inputs. Commented Apr 15, 2015 at 16:35

2 Answers 2

1

You could use sed:

$ cat test.csv 
"c:\abc\bcd\def\123\456", 1, 2
"c:\abc\bcd\def\234\456", 1, 2
"c:\abc\bcd\def\432\456", 3, 4

$ sed -i.bak -e 's/c\:\\abc\\bcd\\//1' test.csv

$ cat test.csv
"def\123\456", 1, 2
"def\234\456", 1, 2
"def\432\456", 3, 4

I am using sed here in this way:

sed -e 's/<SEARCH TERM>/<REPLACE_TERM>/<OCCURANCE>' FILE

where

  • <SEARCH TERM> is what we are looking for (in this case c:\abc\bcd\, but backslashes need to be escaped).
  • <REPLACE TERM> is what we want to replace it with, in this case nothing, and
  • <OCCURANCE> is which occurance of the item we want to replace, in this case the first item in each line.

(-i.bak stands for: Don't output, just edit this file. (but make a backup first))

Updated according to @david-c-rankin comment. He is right, make a backup before editing files in case you make a mistake.

Sign up to request clarification or add additional context in comments.

1 Comment

sed -i.bak ... filename stands for edit filename in place, but if I screw it up, please make a backup for me in filename.bak. It's always better to delete .bak files later...
0
# init variable
MaxPath="$( sed -n 's/,.*//p;1q' YourFile )"
GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"

# search the biggest pattern to remove
while [ ${#MaxPath} -gt 0 ] && [ $( grep -c -v -E "${GrepPath}" YourFile ) -gt 0 ]
 do
   MaxPath="${MaxPath%%?}"
   GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"
 done

# Adapt your file
if [ ${#MaxPath} -gt 0 ]
 then
   sed "s#${GrepPath}##" YourFile
 fi
  • Assuming for the sample that there is no special regex char nor # in MaxPath
  • the grep -c -v -E is not optimized in term of performance (treat whle file each time where it can stop at first miss)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.