0

I have tested (@{[^{}]*})* to match @{whatever} and it is correct (https://regex101.com/). So, in spite of portability nightmare for regular expressions, I finally built the proper std::regex with:

const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped.

Escapes could be simplified using R"()" but that's not the question. As I said, the regex works. Here a simple snippet example which extracts the pattern using regex_search through iteration:

#include <iostream>
#include <string>
#include <regex>

int main () {
  std::string str = "Bye @{foo} ! hi @{bar} !";
  std::smatch matches;
  std::string::const_iterator it( str.cbegin() );

  const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped
  // or: const char *re_str = R"(@\{[^\{\}]*\})";

  try {
    std::regex re(re_str);
    while (std::regex_search(it, str.cend(), matches, re)) {
      std::cout << matches[0] << std::endl;
      it = matches.suffix().first;
    }
  }
  catch (std::exception& e) {
    std::cout << e.what() << std::endl;
    return 1;
  }

  return 0;
}

Output:

g++ regex_search.cc && ./a.out
@{foo}
@{bar}

it works.

Well, I'm wondering if there is any better approach (performance pov). So, I tried with std::regex_match instead of iterating on std::regex_search. I used a capture group for that, just enclosing previous regular expression within ()*:

const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.

This is the source:

#include <iostream>
#include <string>
#include <regex>

int main () {
  std::string str = "Bye @{foo} ! hi @{bar} !";
  std::smatch matches;
  std::string::const_iterator it( str.cbegin() );

  const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.

  try {
    std::regex re(re_str);
    if (std::regex_match(str, matches, re)) {
      for (int k=0; k<matches.size(); k++) std::cout << "[" << k << "]: " << matches.str(k) << std::endl;
    }
  }
  catch (std::exception& e) {
    std::cout << e.what() << std::endl;
    return 1;
  }

  return 0;
}

Output:

g++ regex_match.cc && ./a.out

Its output is empty !!!

I imagine, that's not the way to use std::regex_match although it is supposed to extract matches for captured group. Perhaps the regex this time is invalid (I don't know because, as I said, it is a portability nightmare).

So,

  1. is using regex_search enough and worths the performance concern ?
  2. Is regex_match better algorithm or is it equivalent ?
  3. What's wrong with my source for regex_match ?

BRs, thank you in advance

4
  • for regex_match your regex has to match the entire input string, your regex doesn't match the whole string: regex101.com/r/4HzmeB/1 Commented Jul 21, 2022 at 17:33
  • I didn't put initial ^ and final $, so it should match, right? Commented Jul 23, 2022 at 18:55
  • No, regex_match matches the whole string Commented Jul 23, 2022 at 22:34
  • Ok Alan, that's the thing. So, regex_match was misunderstood by me and we should iterate as @MilesBudnek says. Commented Aug 10, 2022 at 15:29

1 Answer 1

1
  • std::regex_search searches for the pattern anywhere in the input string.
  • std::regex_match checks if the pattern matches the entire input string.

Your pattern does not match your entire string, so std::regex_match will not find a match. You would need something like .*?(@{[^{}]*}).*?(@{[^{}]*}).* if you wanted to match the entire string and extract the @{foo} and @{bar} portions.

If you want to be able to easily find an arbitrary number of matches for a pattern in your string, take a look at std::regex_iterator. For example, the following would find all instances of words surrounded by @{} and save them to a std::vector:

auto find_all(const std::string& string, const std::regex& pattern)
{
    std::vector<std::smatch> matches;
    std::sregex_iterator begin{s.begin(), s.end(), pattern};
    std::sregex_iterator end;

    std::copy(begin, end, std::back_inserter(matches));
    return matches;
}

Live Demo

Note: Remember that std::match_results hold iterators into the searched string, so make sure not to use it after the string's lifetime ends.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, same thing here. The foo/bar was just an example. The string could have any arbitrary amount of @{xx} patterns.
@eramos For that I would use std::regex_iterator. It's designed for finding and iterating through all of the matches for a regex in a string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.