4

I am using std::regex r("-?[0-9]*(.[0-9]+)?(e-?[0-9]+)?") to validate numbers (ints/fixed point/floating point). The MWE is below:

#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <algorithm>

using namespace std;

bool isNumber(string s) {
  // can ignore all whitespace
  s.erase(remove(s.begin(), s.end(), ' '), s.end());
  std::regex r("-?[0-9]*(.[0-9]+)?(e-?[0-9]+)?");
  return regex_match(s, r);
}

int main() {
  std::vector<string> test{
    "3", "-3", // ints
      "1 3", " 13", " 1 3 ", // weird spacing
      "0.1", "-100.123123", "1000.12312", // fixed point
      "1e5", "1e-5", "1.5e-10", "1a.512e4", // floating point
      "a", "1a", "baaa", "1a", // invalid
      "1e--5", "1e-", //invalid
      };

  for (auto t : test) {
    cout.width(20); cout << t << " " << isNumber(t) << endl;
  }

  return 0;
}

I notice the compile time is quite large compared to what I was expecting:

  • gcc 5.4 -O0 -std=c++11, 2.3 seconds
  • gcc 5.4 -O2 -std=c++11, 3.4 seconds
  • clang++ 3.8 -O0 -std=c++11, 1.8 seconds
  • clang++ 3.8 -O2 -std=c++11, 3.7 seconds

I use this for an online judge submission, which has a time limit on the compilation stage.

So, the obivous questions:

  • why is the compilation time so large? I'm under the impression that when I use regexes in vim/emacs/grep/ack/ag etc. (on the same machine) the compilation really takes a lot less than this.
  • is there any way to reduce the compilation time of a regular expression in C++?
4
  • It's not so much the compilation time of the regex, but of all of the code required to implement all of the regex functionality. Compare the time to using 10 regexes. Commented Jan 25, 2017 at 20:40
  • It looks like @chris is right. Try compiling with -save-temps, and look at the size of the resulting intermediate files. Commented Jan 25, 2017 at 20:48
  • The regex is probably going to build a finite state machine on the back end. That takes a lot of time. Most likely that is what you are seeing. Commented Jan 25, 2017 at 20:48
  • 1
    As for the pattern itself, the dot must be escaped. Commented Jan 25, 2017 at 20:51

1 Answer 1

1

You can certainly lighten the computational load of your regex by appropriately escaping the decimal point: -?[0-9]*(\.[0-9]+)?(e-?[0-9]+)? This will of course prevent you from returning a false positive on numbers like: "1 3" (don't worry that's a good thing those were 2 numbers.) But in this case and many others when you stoop to using regexes "Now you have 2 problems".

Using a istringstream would provide a more specialized, robust solution to this problem:

bool isNumber(const string& s) {
    long double t;
    istringstream r(s);

    return r >> t >> ws && r.eof();
}

Live Example

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.