0

With a string like {float: 'null', another: 'foo'}, I'd like to grab each set of key/values pairs so that the groups would output float null, and another and foo. My current regex is /\{(?<set>(?<key>\w*)\s*\:\s*(?<value>.*)?\s?)*\}/g It grabs the key correctly, but anything past from the comma on receives it as the value. I'm using named groups mainly just for clarity. Can't figure out how to extract each key/value pair especially when there are multiple. Thanks for any help

Currently am trying /\{(?<set>(?<key>\w*)\s*\:\s*(?<value>.*)?\s?)*\}/g but the output is:

the group 'set': float: 'null', another: 'foo' (correct)

the group 'key': float (correct)

the group 'value': 'null', another: 'foo' (incorrect, I want just null)

Would like it to capture all key/value pairs if possible


Edit for more clarity:

My specific use case is for parsing Markdown and plugging it into custom components in Svelte, where I want to control the ability to gather props from the markdown syntax on an image. From what I've gathered online about putting attributes on an image, it should look something like:

![Alt Text]https://<fullurl>.jpg "This is hover text"){prop1: 'foo', prop2: 'bar', float: true}

Reason for regex is parsing the markdown string. It's not JSON, and I dont really gain anything by following JSON semantics ("'s on the key)

10
  • try to include a delimiter for set and key. If you are sure it is a word you could go with \w+ or use the ' to end the string Commented Apr 29, 2024 at 6:35
  • 3
    That is not "similar to JSON", it is JSON. I think the best approach would be to just use JSON.parse() and after that replace all 'null' with null. If needed you can then use Object.entries() on the result. Commented Apr 29, 2024 at 6:38
  • 1
    Why use regex? You already have the key value pairs in your object? Commented Apr 29, 2024 at 6:59
  • 2
    @Peter B: isn't json supposed to use only double quotes? Both around values ("foo" not 'foo') and field names ("another" not another)? (I've never even read json spec. So not 100% sure. But I know that I was often forced to replaces quotes with double quotes to avoid errors, with both python and firefox's JS. So, in practice, using json parsers wouldn't work on such a string) Commented Apr 29, 2024 at 13:16
  • 1
    In fact, it looks like javascript object. (One could say a JavaScript Object Notation. But not THE JavaScript Object Notation). But of course, I wouldn't dare utter the word eval, that would be evil :D Commented Apr 29, 2024 at 13:57

2 Answers 2

0

Have a go with this long JavaScript regex:

/(?<key>\w*)\s*:\s*(?<value>(?<quote>["'])(?:\\.|.)*?\k<quote>|(?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g

In action (view in full page, if not it's not all visible):

const regexKeyValue = /(?<key>\w*)\s*:\s*(?<value>(?<quote>["'])(?:\\.|.)*?\k<quote>|(?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g;

document.getElementById('search').addEventListener('click', function () {
  const input = document.getElementById('input').value;

  let match,
      i = 1,
      output = [];

  while ((match = regexKeyValue.exec(input)) !== null) {
    console.log(`Match n°${i} : ` + match[0]);
    console.log('match.groups =', match.groups);

    // If the value is starting with quotes, then unquoted it and
    // also replace all the escape sequences (ex: "\\n" should become "\n").
    let value = match.groups.value;
    // If it's double quotes, let's use JSON.parse() as it will handle everything.
    if (value.match(/^"/)) {
      value = JSON.parse(value);
    }
    // If it's simple quotes, we can't use JSON.parse() so we have to convert
    // it to a double-quoted string before.
    else if (value.match(/^'/)) {
      value = value
        // 1) Remove the simple quotes around.
        .replace(/^'|'$/g, '')
        // 2) Replace all \' by '.
        // We have to search for all backslashes to handle also an escaped backslash.
        .replace(/\\(.)/g, function (fullMatch, afterBackslash) {
          if (afterBackslash === "'") {
            return "'";
          } else {
            return fullMatch;
          }
        })
        // 3) Escape all double quotes (" becomes \").
        .replace(/"/g, '\\"');
      // 4) Now use JSON.parse();
      value = JSON.parse(`"${value}"`);
    }
    
    // If it's a number or a constant, then convert the string to this real JS value.
    if (typeof match.groups.number !== 'undefined' ||
        typeof match.groups.constant !== 'undefined') {
      value = JSON.parse(match.groups.value);
    }

    console.log('value =', value);
    
    output.push(
      `Match n°${i++} :\n` +
      `  Key   : ${match.groups.key}\n` +
      `  Value : ${value}\n`
    );
  }

  document.getElementById('output').innerText = output.join("\n");
  document.getElementById('label').classList.remove('hidden');
});
textarea {
  box-sizing: border-box;
  width: 100%;
}

pre {
  overflow-y: scroll;
}

.hidden {
  display: none;
}
<textarea id="input" rows="10">{
  float: 'null',
  another: "foo",
  age: 45,
  type: '"simple" \' quote',
  comment: "Hello,\nA backslash \\, a tab \t and a \"dummy\" word.\nOk?",
  important: true,
  weight: 69.7,
  negative: -2.5
}</textarea>

<button id="search">Search for key-value pairs</button>

<p id="label" class="hidden">Matches:</p>
<pre><code id="output"></code></pre>

The same regular expression, with comments, with the x flag that PCRE offers:

/
(?<key>\w*)        # The key.
\s*:\s*            # : with optional spaces around.
(?<value>          # The value.
  # A string value, single or double-quoted:
  (?<quote>["'])   # Capture the double or single quote.
    (?:\\.|.)*?    # Backslash followed by anything or any char, ungreedy.
  \k<quote>        # The double or single quote captured before.
|
  # Int and float numbers:
  (?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)
|
  # true, false and null (or other constants):
  (?<constant>true | false | null)
)
/gx

Or better, on regex101, you'll have the colours and the explanation on the right column: https://regex101.com/r/bBPvUd/2

Sign up to request clarification or add additional context in comments.

2 Comments

This pretty darn succinct and I like what you did with the backreference on the string to ensure proper surrounding. I'm going to use your answer, but I can't figure out how to only capture the string without the quotations. How would you do this?
Well, as I did it, by using the JavaScript code in my answer, using JSON.parse(). This is because we want \n to be the carriage return char and not a backslash followed by the n character. But if you quickly want to capture the content within the regex, just add a capturing group between the quotes but be aware that your captured content isn't the final value you desire, as it will not be evaluated to a proper string.
0

As mentioned in the comments, eval() is considered as "evil" or at least as unsafe. I have forgotten exactly why, something to do with cross-site-scripting. However, if it is used within a "safe" environment, i. e. for preprocessing of input that you have full control over, then it might be admissible nonetheless.

const md=`Some text and now the image: 
![Alt Text]https://<fullurl>.jpg "This is hover text"){prop1: 'foo', prop2: 'bar', float: true} 
and some more text.

A new paragraph any yet nother picture
![Alt Text2]https://<fullerURL>.jpg "This is another hover text"){prop1: 'fool', prop2: 'bart', float: false} and this is the end.`;

function unsafeParse(s){
 return s.match(/\{[^}]+\}/g).map(t=>eval(`(${t})`));
}

// ouputs an array of all image property objects:
console.log(unsafeParse(md));

Apart from being "unsafe", the above is not completely fail-safe, as property values containing the "}" character will cause problems ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.