find replace using regex in sed

Question

i'm trying to write an expression to replace a file called testRegex.csv

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}

with this

{"type":"MultiPolygon","coordinates":[[[[-74.043886,40.690185], [-74.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

i've tried the following

sed 's/(\W\d\d[.]\d*[,]\d\d[.]\d*)/[$1],/g' <testRegex.csv >testRegex2.csv
sed 's/(\W\d\d[\.]\d*[\,]\d\d[\.]\d*)/[$1]\,/g' <testRegex.csv >testRegex2.csv
sed 's/(\W\d\d\.\d*\,\d\d\.\d*)/[$1]\,/g' <testRegex.csv >testRegex2.csv

can anyone see why these aren't working?

You aren't allowing for signs on the numbers (unless that's why you have \W in there, but that matches all sorts of garbage); you aren't allowing single digits for the integer part of the number. You don't seem to be allowing for the square brackets (so you won't substitute after the first number pair). You should be able to combine the three scripts into one; use -e to separate the sed commands. Your I/O redirections runs the first command and save the results; the second command is run on the original data and overwrites what the first produced; and then the third runs and overwrites. — Jonathan Leffler
– Jonathan Leffler, Commented Jun 6, 2013 at 17:29
Incidentally, your required output has 7 [ and 6 ]; do you need to add one at the end or remove one at the start? — Jonathan Leffler
– Jonathan Leffler, Commented Jun 6, 2013 at 17:32
That's an ere, sed uses BREs by default. You're using $1, sed uses \1 — Kevin
– Kevin, Commented Jun 6, 2013 at 17:36

Andrew Clark · Accepted Answer · 2013-06-06 17:39:01Z

2

Try the following:

sed -E -e 's/([0-9-]+\.[0-9]*,[0-9-]+\.[0-9]*)/[\1],/g' -e 's/,]/]/'

Note that on some systems you may need to replace the -E option with -r, this is the option for extended regex support.

answered Jun 6, 2013 at 17:39

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jonathan Leffler · Accepted Answer · 2013-06-07 18:14:29Z

I tried to solve a slightly harder problem than the accepted answer — and developed an answer which doesn't change lines already in the output format. It's tough, but it can be done (more succinctly with the ERE support than in traditional sed BRE notation).

BRE notation

sed '/\([^[]\[\[\[\)\(\(\[[-+]*[0-9][0-9]*\.[0-9]*,[-+]*[0-9][0-9]*\.[0-9]*\], \)*\)\([-+]*[0-9][0-9]*\.[0-9]*,[-+]*[0-9][0-9]*\.[0-9]*\)/ {
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}' <<'EOF'

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}
EOF

ERE notation

sed -E '/([^[]\[\[\[)((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*)([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+)/ {
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}' <<'EOF'

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

EOF

Example output

{"type":"MultiPolygon","coordinates":[[[[-74.043886,40.690185], [-74.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

Explanation of ERE

/([^[]\[\[\[)((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*)([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+)/

This can be split into 3 sub-regexes:

([^[]\[\[\[) This matches the three square brackets preceded by something other than a square bracket. It becomes \1 in the replacement.
((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*) This has two captures, but I'm really on interested in the outer one. The inner one looks for a square bracket, followed by a possibly signed number (which insists on using at least one digit before and one digit after the decimal point), a comma, another possibly signed number, a close square bracket (the backslash isn't strictly necessary), and a comma and a space. This inner capture would be \3, and can be repeated zero or more times. The outer capture captures all the repeats of \3, and is called \2. If the outer capture is not used, then what is not the inner capture only captures the last repeat of the 'pair of numbers in square brackets', whereas with the two captures, you get all the repetitions.
([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+) This captures a pair of possibly signed numbers separated by a comma.

The replacement script used a conditional sed loop:

{
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}

The : redo sets a label. The s//\1\2[\4],/ replaces the first unbracketed 'pair of possibly signed numbers separated by a comma' by the same information with the pair enclosed in square brackets. Adding a g suffix doesn't do anything; the pattern has to work over the previously matched text. So, there is a t redo to conditionally jump back to the label redo, if a substitute has been made. The final s/// removes the comma added after the last new pair of numbers in square brackets.

Note that the BRE regex doesn't insist on a digit after the decimal point; it could be made even longer so that it did (add an extra [0-9] after each of the four decimal points).

Collectives™ on Stack Overflow

find replace using regex in sed

2 Answers 2

Comments

BRE notation

ERE notation

Example output

Explanation of ERE

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

BRE notation

ERE notation

Example output

Explanation of ERE

Comments

Your Answer

Sign up or log in

Post as a guest

Related