1

Having difficulty getting the boost regex match results to come out in the same way as the standard library. Meaning the standard library returns the first match in a multi line input that produces multiple matches.

The goal is to get the best performance as the product that runs this code hits it a great deal. The substring calls are horrendously slow and therefore the boost way of doing things.

This product is in C++ prior to C++ 11. old stuff that I can't upgrade.

Example below:

_pattern : [A-Za-z0-9].+\\n[ \t]*\\n

Input string: ( the line feeds are essential )

CLINICAL: Left 2cm Firm Fibrous Lump @12:00.

No prior exams were available for comparison.

There is gynecomastia in both feet.

Standard Library version of code:

ORegExpr::index(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
    return O_NPOS;

std::smatch reMatch;    
std::regex re(_pattern);
std::string inputData = "";
if (start > 0 )
    inputData = inputStr._string.substr(start); 
else
    inputData = inputStr._string;

if(std::regex_search(inputData,reMatch,re))
{
  *length = reMatch.length();
  return reMatch.position(0) + start;   
}
*length = 0;
return O_NPOS;
}

**Boost version **

size_t
ORegExpr::index_boost(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
    return O_NPOS;  

boost::regex re(_pattern);

boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;    
std::string::const_iterator s = inputStr.std().begin() + start;    
std::string::const_iterator e = inputStr.std().end();

if(boost::regex_search(s,e,what,re,flags)){
    *length = what.length();        
    return what.position() + start;
}

*length = 0;
return O_NPOS;
}

** replace boost with std to see if using interators would make a difference **

size_t
ORegExpr::index_boostnowstd(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
    return O_NPOS;  

std::regex re(_pattern);

std::match_results<std::string::const_iterator> what;
//boost::match_flag_type flags = boost::match_default;  
std::string::const_iterator s = inputStr.std().begin() + start;    
std::string::const_iterator e = inputStr.std().end();

if(std::regex_search(s,e,what,re)){
    *length = what.length();        
    return what.position() + start;
}

*length = 0;
return O_NPOS;
}

I tried every which way I could to get the "array" of matches and to just return the length of the first match, but for the life of me I couldn't get this from boost. It would return both matches and the total length of both of them, which is the first and second line of the input string.

I have fully functional POC if my explanation isn't as well described as I think it is.

I expect the output of the functions to return a size_t of 46 which is the length of the first line of the input string. Standard library does this but the boost doesn't. The reason for the boost, is that it seems to run faster than the standard library.

7
  • What is the goal here? To just get the first line of text? If that is the case then you can use a stringstream and getline to extract that quite easily. Another tyhing you can do to help performance is to not create the regex every time you call the function. Building a regex is expensive. Making it static or a class member means you only need to pay to construct once, which can save a lot of time. Commented Jul 15, 2019 at 14:24
  • Thanks Nathan. totally get what you are saying about constructing all the time, but this piece of code takes in multiple patterns, not just the one I gave. The goal here is to make the boost function return the length of the first match, which is the first line of the inputstring for this example. there are multiple input string/patterns that, again, exercise this code. So, I don't know what will match and what will not. so I can't do a getline. Hopefully, that makes more sense. thanks again!! Commented Jul 15, 2019 at 14:44
  • Your regular expression doesn't match any @, so why would it match the first line of the input string? Commented Jul 15, 2019 at 15:59
  • Still not totally sure what you're trying to do. Perhaps something like this? Commented Jul 15, 2019 at 16:15
  • It matches.. if you go here regextester.com and put in the pattern then the whole input string, you will see that it does match the first line. Commented Jul 15, 2019 at 16:20

1 Answer 1

1

Your regular expression is actually matching the first two lines, not the first one alone.

Try this one instead:

"[^\\n]+\\n\\n"

Live Demo (C++03)

This regular expression will match the first occurrence of "no newline characters followed by two newline characters" which will match the first line of your output, giving you a length of 46 (includes newline characters)


Edit: From your comments it appears you're stuck with the given expression.

What you can try to do is to use Boost's match_flag_type to alter how the regular expression works. In this case, using boost::match_any to return the leftmost match.

boost::match_flag_type flags = boost::match_any;

From the doc for match_any:

Specifies that if more than one match is possible then any match is an acceptable result: this will still find the leftmost match, but may not find the "best" match at that position. Use this flag if you care about the speed of matching, but don't care what was matched (only whether there is one or not).

Demo #2

Sign up to request clarification or add additional context in comments.

3 Comments

So.. I see this would work, but I need the pattern as I have it in the example. If you run this through the std lib regex you will see what I mean when the first match only, which is the first line is returned. I know my example has two matches as I have run this in multiple regex engines online and through a POC. Google's RE2 does it correctly as well. I just don't know why boost doesn't. I need a boost solution, if there is one. I appreciate the help Andy!!! I really do
@Optum56: Edited
OMG!!!! I literally tried a lot of the flags except that one!!! You are a life saver!!! Thank you so much for putting up with my example and sticking with me. I am sooo beyond greatful!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.