1

I have a string that looks like this. It's obviously a multi-line string and I would like to split it into one string per stanza.

{
   "timestamp":1317911700,
   "application":"system.dev",
   "metrics":{
      "qlen":0,
      "read.bytes":0,
      "write.bytes":185165.0123762,
      "busy":0.021423
   },
   "dimensions":{
      "device":"sda"
   }
}

{
   "timestamp":1317911700,
   "application":"system.fs",
   "metrics":{
      "inodes.used":246627,
      "inodes.free":28703901,
      "capacity.kb":227927024,
      "available.kb":209528472,
      "used.kb":6820512
   },
   "dimensions":{
      "filesystem":"/"
   }
}

{
   "status_code":0,
   "application":"system",
   "status_msg":"Data collected successfully"
}

My regex looks like this:

/^({\n[^}]+^})/m

But I am only capturing:

{
   "status_code":0,
   "application":"system",
   "status_msg":"Data collected successfully"
}

Which kinda makes sense since that's where the first curly brace is. What I am trying to do is capture from anywhere there is a /^{/ to anywhere there is a /^}/ as a single string. But I think the other curly braces in there are tr

3
  • 1
    Is there a reason you can't use a real JSON parser to extract the data you want? Commented Oct 6, 2011 at 17:57
  • It's not valid JSON. It would have to look like this to be valid. [ {stanza1},{stanza2},{stanza99}]. I've already run it through a validator and it failed as is. I do not control the output so I need to capture it and munch it myself. I was wondering if I could do something with lookahead. my @foo = /somelookahead/; Commented Oct 6, 2011 at 18:02
  • Nevermind, I see that there are no commas at the end of each stanza, so JSON parsing won't work. Commented Oct 6, 2011 at 18:06

3 Answers 3

4

I can think of a few approaches.

  • There is an example somewhere in perlre on how you can implement a recursive pattern. This is hard. You need to take curlies in strings into account.

  • Text::Balanced already provides means of matching balanced parens (including curlies). This might be easier, because I think it can take curlies in strings into account.

  • It looks like you can simply split on blank lines.

    @json_snippets = split /^$/m, $json_snippets;
    
  • But the most reliable solution is to use JSON::XS's "incremental parser". (Search for that in its documentation.)

Sign up to request clarification or add additional context in comments.

Comments

1
for my $stanza (split /^$/m, $str) {
  ...
}

1 Comment

And thereafter you have valid JSON strings: foreach my $stanza ( split( /^$/m, $str ) ) { my $json = decode_json( $stanza ); print Dumper( $json ); }
0

If you can't use a JSON parser to properly do it, I would just split at the end of a stanza.

my @stanzas = split /^}\K\n\n/;

1 Comment

It's not valid JSON. It's more like a series of JSON snippets.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.