Regex to build a vector from braces

Question

I am wanting to turn a std::string such as:

"{1, 2}, {one, two}, {123, onetwothree}"

Into a std::vector of std::pairs of std::strings which would look something like:

std::vector<std::pair<std::string, std::string>> v = {{"1", "2"}, {"one", "two"}, {"123", "onetwothree"}};
// where, for instance
v[0] == std::make_pair("1", "2"); // etc.

This seems like a case where the original std::string could be parsed most easily using std::regex, but I am NOT a regex expert (or novice), let alone a std::regex expert. Any ideas for a recipe here?

Right now, I am using std::string's own find methods to do this manually. Not being a regex person, this did seem like a good case to get started, but examples I've found tend to return contents of a single brace match, but not iterate through multiple matches of brace matches. — DiB
– DiB, Commented Nov 27, 2013 at 16:07
As for the format, it is pretty strict with "{first, second}, {third, fourth}, etc." It's a list of pairs of strings, if that makes it more clear. — DiB
– DiB, Commented Nov 27, 2013 at 16:08
The example I found at stackoverflow.com/questions/13227802/… is on the right track, but far too trivial. I just don't know how to expand it to this more complex case. I'd be fine with some hybrid string/regex solution if that is best. — DiB
– DiB, Commented Nov 27, 2013 at 16:13

Community · Accepted Answer · 2017-05-23 10:31:44Z

3

Currently, <regex> doesn't work well with GCC, here is a boost version, compiled with -lboost_regex.

boost capture fits this case, but it's by default not enabled.

Here is the original post: Boost C++ regex - how to get multiple matches

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;

int main()
{
  string str = "{1, 2}, {one, two}, {123, onetwothree}";

  boost::regex pair_pat("\\{[^{}]+\\}");
  boost::regex elem_pat("\\s*[^,{}]+\\s*");

  boost::sregex_token_iterator end;

  for(boost::sregex_token_iterator iter(str.begin(), str.end(), pair_pat, 0);
      iter != end; ++iter) {

    string pair_str = *iter;
    cout << pair_str << endl;

    for (boost::sregex_token_iterator it(pair_str.begin(), pair_str.end(), elem_pat, 0);
         it != end; ++it)
      cout << *it << endl;
  }

  return 0;
}

edited May 23, 2017 at 10:31

CommunityBot

11 silver badge

answered Nov 27, 2013 at 17:04

gongzhitaao

6,7223 gold badges41 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sam Cristall · Accepted Answer · 2013-11-27 16:51:03Z

1

The match pattern is pretty simple: "\{\s*(\w+)\s*\,\s*(\w+)\s*\}" so we just need to loop through and assemble all the matches. C++11 makes this pretty straight forward. Give this a shot:

std::string str = "{1, 2}, {one, two}, {123, onetwothree}";
std::vector<std::pair<std::string, std::string>> pairs;
std::regex exp(R"(\{\s*(\w+)\s*\,\s*(\w+)\s*\})");
std::smatch sm;
std::string::const_iterator cit = str.cbegin();
while (std::regex_search(cit, str.cend(), sm, exp)) {
    if (sm.size() == 3) // 3 = match, first item, second item
        pairs.emplace_back(sm[1].str(), sm[2].str());
    // the next line is a bit cryptic, but it just puts cit at the remaining string start
    cit = sm[0].second;
}

EDIT: Explanation on how it works: it matches one pattern at a time, using a constant iterator to point at the remainder after each match:

{1, 2}, {one, two}, {123, onetwothree}
^ iterator cit
-- regex_search matches "{1, 2}" sm[1] == "1", sm[2] == "2"

{1, 2}, {one, two}, {123, onetwothree}
      ^ iterator cit
-- regex_search matches "{one, two}" sm[1] == "one", sm[2] == "two"

{1, 2}, {one, two}, {123, onetwothree}
                  ^ iterator cit
-- regex_search matches "{123, onetwothree}" sm[1] == "123", sm[2] == "onetwothree"

{1, 2}, {one, two}, {123, onetwothree}
                                      ^ iterator cit
-- regex_search returns false, no match

edited Nov 27, 2013 at 16:51

answered Nov 27, 2013 at 16:29

Sam Cristall

4,39720 silver badges30 bronze badges

4 Comments

DiB Over a year ago

I put this in and it worked exactly like I wanted. I guess my biggest question here is in understanding how the regex knows to look for N-number of matches. I was playing with other ideas based on searches and other's comments, and I was having a hard time getting more than the outer set of {} matched.

DiB Over a year ago

AH! So the regex is just matching one at a time, but it's the string iterator that is marching down the string looking for any matches that follow. Thanks for the extra explanation there. (I should have looked at how the iterator was being used more closely!)

Sam Cristall Over a year ago

I added an explanation on how it works -- the regex is not matching N matches, but rather the first match. We then use an iterator to iterate over the remainder. Doing this in a loop gets us every match.

Cubbi Over a year ago

sounds a lot like what std::regex_token_iterator does

Collectives™ on Stack Overflow

Regex to build a vector from braces

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related