1

I am wanting to turn a std::string such as:

"{1, 2}, {one, two}, {123, onetwothree}"

Into a std::vector of std::pairs of std::strings which would look something like:

std::vector<std::pair<std::string, std::string>> v = {{"1", "2"}, {"one", "two"}, {"123", "onetwothree"}};
// where, for instance
v[0] == std::make_pair("1", "2"); // etc.

This seems like a case where the original std::string could be parsed most easily using std::regex, but I am NOT a regex expert (or novice), let alone a std::regex expert. Any ideas for a recipe here?

10
  • 4
    And you have tried what so far? Commented Nov 27, 2013 at 15:52
  • What about "{1, {2, 3} { {a, b} } }"? Commented Nov 27, 2013 at 15:55
  • Right now, I am using std::string's own find methods to do this manually. Not being a regex person, this did seem like a good case to get started, but examples I've found tend to return contents of a single brace match, but not iterate through multiple matches of brace matches. Commented Nov 27, 2013 at 16:07
  • As for the format, it is pretty strict with "{first, second}, {third, fourth}, etc." It's a list of pairs of strings, if that makes it more clear. Commented Nov 27, 2013 at 16:08
  • The example I found at stackoverflow.com/questions/13227802/… is on the right track, but far too trivial. I just don't know how to expand it to this more complex case. I'd be fine with some hybrid string/regex solution if that is best. Commented Nov 27, 2013 at 16:13

2 Answers 2

3

Currently, <regex> doesn't work well with GCC, here is a boost version, compiled with -lboost_regex.

boost capture fits this case, but it's by default not enabled.

Here is the original post: Boost C++ regex - how to get multiple matches

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;

int main()
{
  string str = "{1, 2}, {one, two}, {123, onetwothree}";

  boost::regex pair_pat("\\{[^{}]+\\}");
  boost::regex elem_pat("\\s*[^,{}]+\\s*");

  boost::sregex_token_iterator end;

  for(boost::sregex_token_iterator iter(str.begin(), str.end(), pair_pat, 0);
      iter != end; ++iter) {

    string pair_str = *iter;
    cout << pair_str << endl;

    for (boost::sregex_token_iterator it(pair_str.begin(), pair_str.end(), elem_pat, 0);
         it != end; ++it)
      cout << *it << endl;
  }

  return 0;
}
Sign up to request clarification or add additional context in comments.

Comments

1

The match pattern is pretty simple: "\{\s*(\w+)\s*\,\s*(\w+)\s*\}" so we just need to loop through and assemble all the matches. C++11 makes this pretty straight forward. Give this a shot:

std::string str = "{1, 2}, {one, two}, {123, onetwothree}";
std::vector<std::pair<std::string, std::string>> pairs;
std::regex exp(R"(\{\s*(\w+)\s*\,\s*(\w+)\s*\})");
std::smatch sm;
std::string::const_iterator cit = str.cbegin();
while (std::regex_search(cit, str.cend(), sm, exp)) {
    if (sm.size() == 3) // 3 = match, first item, second item
        pairs.emplace_back(sm[1].str(), sm[2].str());
    // the next line is a bit cryptic, but it just puts cit at the remaining string start
    cit = sm[0].second;
}

EDIT: Explanation on how it works: it matches one pattern at a time, using a constant iterator to point at the remainder after each match:

{1, 2}, {one, two}, {123, onetwothree}
^ iterator cit
-- regex_search matches "{1, 2}" sm[1] == "1", sm[2] == "2"

{1, 2}, {one, two}, {123, onetwothree}
      ^ iterator cit
-- regex_search matches "{one, two}" sm[1] == "one", sm[2] == "two"

{1, 2}, {one, two}, {123, onetwothree}
                  ^ iterator cit
-- regex_search matches "{123, onetwothree}" sm[1] == "123", sm[2] == "onetwothree"

{1, 2}, {one, two}, {123, onetwothree}
                                      ^ iterator cit
-- regex_search returns false, no match

4 Comments

I put this in and it worked exactly like I wanted. I guess my biggest question here is in understanding how the regex knows to look for N-number of matches. I was playing with other ideas based on searches and other's comments, and I was having a hard time getting more than the outer set of {} matched.
AH! So the regex is just matching one at a time, but it's the string iterator that is marching down the string looking for any matches that follow. Thanks for the extra explanation there. (I should have looked at how the iterator was being used more closely!)
I added an explanation on how it works -- the regex is not matching N matches, but rather the first match. We then use an iterator to iterate over the remainder. Doing this in a loop gets us every match.
sounds a lot like what std::regex_token_iterator does

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.