8

The following outputs ">Hut" where I expect it to output "Hut". I know that .* is greedy but > must be matched and it is outside of the capture group so why is it in my submatch?

#include <string>
#include <regex>
#include <iostream>

using namespace std;

int main() {
        regex my_r(".*>(.*)");
        string temp(R"~(cols="64">Hut)~");
        smatch m;
        if (regex_match(temp, m, my_r)) {
                cout << m[1] << endl;
        }
}
5
  • note that regex implementation support is still very low on gcc and MSVC probably, too. Commented Jun 5, 2012 at 6:54
  • I upgraded to g++ 4.7, but still same output. I still think this is a misunderstanding of regexes on my part. Too often have I blamed software for my own errors in the past. Commented Jun 5, 2012 at 7:24
  • The regex is good. Try escaping > like \>, this is just guess. Also the initial .* isn't required just use >(.+) Commented Jun 5, 2012 at 8:00
  • related: stackoverflow.com/questions/8060025/… Commented Jun 5, 2012 at 8:26
  • @tuxuday, > has no special meaning, but in some flavors \> is an end-of-word boundary. Best to leave it as it is. Commented Jun 5, 2012 at 8:57

2 Answers 2

7

This is a bug in libstdc++'s implementation. Watch these:

#include <string>
#include <regex>
#include <boost/regex.hpp>
#include <iostream>

int main() {
    {
        using namespace std;
        regex my_r("(.*)(6)(.*)");
        smatch m;
        if (regex_match(std::string{"123456789"}, m, my_r)) {
            std::cout << m.length(1) << ", "
                      << m.length(2) << ", "
                      << m.length(3) << std::endl;
        }
    }

    {
        using namespace boost;
        regex my_r("(.*)(6)(.*)");
        smatch m;
        if (regex_match(std::string{"123456789"}, m, my_r)) {
            std::cout << m.length(1) << ", "
                      << m.length(2) << ", "
                      << m.length(3) << std::endl;

        }
    }

    return 0;
}

If you compile with gcc, the first one (libstdc++) returns the totally wrong result 9, -2, 4 and the second one (boost's implementation) returns 5, 1, 3 as expected.

If you compile with clang + libc++, your code works fine.

(Note that libstdc++'s regex implementation is only "partially supported", as described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52719.)

Sign up to request clarification or add additional context in comments.

5 Comments

Oh my, that’s singularly annoying. Any chance of choosing another syntax option? Not that I’d want something other than ECMA-Script … but if that doesn’t work … (incidentally, I’ve now started wondering why they didn’t go with PCRE).
By the way, the bug still exists in GCC 4.7.
thank you for the examples and explanations. I guess it's not fair of me to expect much if it is only partially supported. I'll either use boost or avoid regexes for the time being.
@KonradRudolph: It's not related to ECMAScript. regex my_r("(.*)(6)(.*)", regex::extended) still have the same bug.
Ah, rats. I thought the engines were pluggable but it looks like it’s only the parser.
3

You can modify your regular expression so that matched parts are divided into groups:

std::regex my_r("(.*)>(.*)\\).*"); // group1>group2).*
std::string temp("~(cols=\"64\">Hut)~");
std::sregex_iterator reg_it(temp.begin(), temp.end(), my_r);

if (reg_it->size() > 1) {
    std::cout
        << "1: " << reg_it->str(1) << std::endl  // group1 match
        << "2: " << reg_it->str(2) << std::endl; // group2 match
}

outputs:

1: ~(cols="64"
2: Hut

Note that groups are specified by bracets ( /* your regex here */ ) and if you want to make a bracet part of your expression, then you need to escape it with \, which is \\ in code. For more information see Grouping Constructs.

This question can also help you: How do I loop through results from std::regex_search?

Also don't use using namespace std; at the beginning of your files, it's a bad practice.

1 Comment

Thank you for your answer and for your tip regarding using namespace std;. I appreciate the explanations!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.