0

I am struggling to come up with a universal regexp that will remove/isolate all leading/trailing garbage from a multi-line string, leaving only the JSON. Without opening up a temporary buffer to re-search-forward/backward for the first and last wavy brackets, how can I programmatically remove everything before the first { and everything after the last }?

CAVEAT: Sometimes, there may not necessarily be any leading/trailing garbage, but there often will be.

BEFORE

"Lorem ipsum dolor sit amet, consectetuer adipiscing elit. [Donec hendrerit
tempor tellus.]

Donec pretium posuere tellus. Proin quam nisl, tincidunt et,
mattis eget, convallis nec, purus.  {Cum sociis natoque [penatibus et] {magnis
dis} parturient montes, {nascetur ridiculus} mus.} Nulla posuere. Donec vitae

dolor."

AFTER

"{Cum sociis natoque [penatibus et] {magnis
dis} parturient montes, {nascetur ridiculus} mus.}"
5
  • If "garbage" is being defined as "anything before first {" and "anything after last }" then I think you've complicated this question by talking about JSON at all. Commented Jan 4, 2022 at 22:50
  • @phils -- The particular JSON data of interest is coming from inquires to an API, received in two forms depending upon the call. In some cases, the response is obtained by using curl and the JSON of interest is preceded by % Total % Received % Xferd ... 0 --:--:-- ... In other cases, data is retreived using url-retrieve-synchronously and the JSON data is preceded by HTTP/1.1 200 \nDate: Tue, 04 Jan 2022 21:02:32 GMT\nContent-Type: application/json... As this project moves forward, there might also be the need to remove nonessential information tailing the JSON of interest. Commented Jan 5, 2022 at 0:29
  • I just mean that the question can be stated clearly without reference to JSON -- your actual requirement is only about matching/deleting prefix and suffix text based on two particular characters, so the fact that the text in the middle is JSON seems entirely incidental. Perhaps it's useful for searchability purposes if other readers want to do the same thing for the same reason, though. Commented Jan 5, 2022 at 2:05
  • Mind you, it sounds like the question you should be asking is "how can I obtain ONLY the response body from my API requests?" Commented Jan 5, 2022 at 2:12
  • @phils -- Thank you! I had no idea that such an animal existed. The silent -s option available with curl seems to do the trick. This will also avoid the problem of intermittent download/status details from curl being interdispersed within the body of lengthy responses. Commented Jan 5, 2022 at 4:34

1 Answer 1

1

C-hig (elisp)Regexp Backslash

\`
     matches the empty string, but only at the beginning of the buffer
     or string being matched against.
 
\'
     matches the empty string, but only at the end of the buffer or
     string being matched against.

AKA bos and eos (for beginning/end-of-string) in rx syntax.

Hence:

(replace-regexp-in-string "\\`[^{]*\\|[^}]*\\'" "" str)

or:

(replace-regexp-in-string
  (rx (or (seq bos (zero-or-more (not "{")))
          (seq (zero-or-more (not "}")) eos)))
  "" str)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.