0

i'm trying to write an expression to replace a file called testRegex.csv

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}

with this

{"type":"MultiPolygon","coordinates":[[[[-74.043886,40.690185], [-74.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

i've tried the following

sed 's/(\W\d\d[.]\d*[,]\d\d[.]\d*)/[$1],/g' <testRegex.csv >testRegex2.csv
sed 's/(\W\d\d[\.]\d*[\,]\d\d[\.]\d*)/[$1]\,/g' <testRegex.csv >testRegex2.csv
sed 's/(\W\d\d\.\d*\,\d\d\.\d*)/[$1]\,/g' <testRegex.csv >testRegex2.csv

can anyone see why these aren't working?

3
  • You aren't allowing for signs on the numbers (unless that's why you have \W in there, but that matches all sorts of garbage); you aren't allowing single digits for the integer part of the number. You don't seem to be allowing for the square brackets (so you won't substitute after the first number pair). You should be able to combine the three scripts into one; use -e to separate the sed commands. Your I/O redirections runs the first command and save the results; the second command is run on the original data and overwrites what the first produced; and then the third runs and overwrites. Commented Jun 6, 2013 at 17:29
  • Incidentally, your required output has 7 [ and 6 ]; do you need to add one at the end or remove one at the start? Commented Jun 6, 2013 at 17:32
  • That's an ere, sed uses BREs by default. You're using $1, sed uses \1 Commented Jun 6, 2013 at 17:36

2 Answers 2

2

Try the following:

sed -E -e 's/([0-9-]+\.[0-9]*,[0-9-]+\.[0-9]*)/[\1],/g' -e 's/,]/]/'

Note that on some systems you may need to replace the -E option with -r, this is the option for extended regex support.

Sign up to request clarification or add additional context in comments.

Comments

0

I tried to solve a slightly harder problem than the accepted answer — and developed an answer which doesn't change lines already in the output format. It's tough, but it can be done (more succinctly with the ERE support than in traditional sed BRE notation).

BRE notation

sed '/\([^[]\[\[\[\)\(\(\[[-+]*[0-9][0-9]*\.[0-9]*,[-+]*[0-9][0-9]*\.[0-9]*\], \)*\)\([-+]*[0-9][0-9]*\.[0-9]*,[-+]*[0-9][0-9]*\.[0-9]*\)/ {
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}' <<'EOF'

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}
EOF

ERE notation

sed -E '/([^[]\[\[\[)((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*)([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+)/ {
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}' <<'EOF'

{"type":"MultiPolygon","coordinates":[[[-74.043886,40.690185 -74.040365,40.700704 -74.040288,40.700644 -74.03995,40.700891]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

EOF

Example output

{"type":"MultiPolygon","coordinates":[[[[-74.043886,40.690185], [-74.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]]}
with this

{"type":"MultiPolygon","coordinates":[[[[-84.043886,40.690185], [-64.040365,40.700704], [-74.040288,40.700644], [-74.03995,40.700891]]]}

Explanation of ERE

/([^[]\[\[\[)((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*)([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+)/

This can be split into 3 sub-regexes:

  1. ([^[]\[\[\[) This matches the three square brackets preceded by something other than a square bracket. It becomes \1 in the replacement.
  2. ((\[[-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+\], )*) This has two captures, but I'm really on interested in the outer one. The inner one looks for a square bracket, followed by a possibly signed number (which insists on using at least one digit before and one digit after the decimal point), a comma, another possibly signed number, a close square bracket (the backslash isn't strictly necessary), and a comma and a space. This inner capture would be \3, and can be repeated zero or more times. The outer capture captures all the repeats of \3, and is called \2. If the outer capture is not used, then what is not the inner capture only captures the last repeat of the 'pair of numbers in square brackets', whereas with the two captures, you get all the repetitions.
  3. ([-+]?[0-9]+\.[0-9]+,[-+]?[0-9]+\.[0-9]+) This captures a pair of possibly signed numbers separated by a comma.

The replacement script used a conditional sed loop:

{
: redo
s//\1\2[\4],/
t redo
s/,]]]/]]]/
}

The : redo sets a label. The s//\1\2[\4],/ replaces the first unbracketed 'pair of possibly signed numbers separated by a comma' by the same information with the pair enclosed in square brackets. Adding a g suffix doesn't do anything; the pattern has to work over the previously matched text. So, there is a t redo to conditionally jump back to the label redo, if a substitute has been made. The final s/// removes the comma added after the last new pair of numbers in square brackets.

Note that the BRE regex doesn't insist on a digit after the decimal point; it could be made even longer so that it did (add an extra [0-9] after each of the four decimal points).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.