Having difficulty getting the boost regex match results to come out in the same way as the standard library. Meaning the standard library returns the first match in a multi line input that produces multiple matches.
The goal is to get the best performance as the product that runs this code hits it a great deal. The substring calls are horrendously slow and therefore the boost way of doing things.
This product is in C++ prior to C++ 11. old stuff that I can't upgrade.
Example below:
_pattern : [A-Za-z0-9].+\\n[ \t]*\\n
Input string: ( the line feeds are essential )
CLINICAL: Left 2cm Firm Fibrous Lump @12:00.
No prior exams were available for comparison.
There is gynecomastia in both feet.
Standard Library version of code:
ORegExpr::index(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
return O_NPOS;
std::smatch reMatch;
std::regex re(_pattern);
std::string inputData = "";
if (start > 0 )
inputData = inputStr._string.substr(start);
else
inputData = inputStr._string;
if(std::regex_search(inputData,reMatch,re))
{
*length = reMatch.length();
return reMatch.position(0) + start;
}
*length = 0;
return O_NPOS;
}
**Boost version **
size_t
ORegExpr::index_boost(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
return O_NPOS;
boost::regex re(_pattern);
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
std::string::const_iterator s = inputStr.std().begin() + start;
std::string::const_iterator e = inputStr.std().end();
if(boost::regex_search(s,e,what,re,flags)){
*length = what.length();
return what.position() + start;
}
*length = 0;
return O_NPOS;
}
** replace boost with std to see if using interators would make a difference **
size_t
ORegExpr::index_boostnowstd(const OString &inputStr, size_t* length, size_t start = 0) const {
if (start == O_NPOS)
return O_NPOS;
std::regex re(_pattern);
std::match_results<std::string::const_iterator> what;
//boost::match_flag_type flags = boost::match_default;
std::string::const_iterator s = inputStr.std().begin() + start;
std::string::const_iterator e = inputStr.std().end();
if(std::regex_search(s,e,what,re)){
*length = what.length();
return what.position() + start;
}
*length = 0;
return O_NPOS;
}
I tried every which way I could to get the "array" of matches and to just return the length of the first match, but for the life of me I couldn't get this from boost. It would return both matches and the total length of both of them, which is the first and second line of the input string.
I have fully functional POC if my explanation isn't as well described as I think it is.
I expect the output of the functions to return a size_t of 46 which is the length of the first line of the input string. Standard library does this but the boost doesn't. The reason for the boost, is that it seems to run faster than the standard library.
stringstreamandgetlineto extract that quite easily. Another tyhing you can do to help performance is to not create theregexevery time you call the function. Building a regex is expensive. Making it static or a class member means you only need to pay to construct once, which can save a lot of time.@, so why would it match the first line of the input string?