3

I am going to use regular expressions in a C++ application, but I am not experienced in regex. I want in particular to check a number of strings if they belong to one of the following categories:

X.anystring -> X must be necessarily and exclusively letter (not digit).

XY.anystring -> X, Y must be necessarily and exclusively digits 0-9 (not letters).

How can I check them using regex? What tutorial for regex could you recommend in order to acquaint me with regex?

3
  • Have you checked out cppreference.com? Commented Jun 25, 2012 at 7:42
  • Have a look at Boost Regex. Commented Jun 25, 2012 at 7:43
  • Funny how the two answers to this question complete each others. One is about the regex, the other about the C++ library. Commented Jun 25, 2012 at 7:53

6 Answers 6

3

Seriously, regexp:s are not the right solution for you in this case.

To start with, regexp:s are not part of the C++ language so you will need to use a specific regexp library. (C++11, whoever, include support for regexp:s.)

Secondly, both your use cases can trivially be encoded in plain C++, all you need to do is to loop over the characters in the strings and check if each of them match your requirements.

Sign up to request clarification or add additional context in comments.

2 Comments

To tell the truth, he doesn't even need a loop; he never needs to look at more than three characters. But it could be a good exercise for getting him started with regular expressions, and when dealing with text input, you very quickly end up in cases where regular expressions are the simplest solution.
Why wouldn't he need to loop and check for all of the expressions btw?
1

The current C++11 standard has support for regular expressions, though I'm not sure off the top of my head which compilers support it and are ready to go.

In the meanwhile, the Boost library provides a nice regular expression system for C++ (link here).

In terms of learning about regex, this may help (focuses on using the Boost regex).

An alternative solution that may be simpler for your case would to be just code it yourself. Something like:

bool check_first(const string& myString)
{
    if (!isalpha(myString[0]) || myString[1] != '.') return false;
    return true;
}

bool check_second(const string& myString)
{
    if (!isdigit(myString[0]) || !isdigit(myString[1]) || myString[2] != '.') return false;
    return true;
}

1 Comment

The first two statements formally contradict each other, since the current standard for C++ is C++11. (Of course, in practice, whether you can actually use it is a different question. But the standard regular expressions are based on boost, so you can use them.)
1

X.anystring -> X must be necessarily and exclusively letter (not digit).

Required regex is

[a-zA-Z]\.[\w]+

XY.anystring -> X, Y must be necessarily and exclusively digits 0-9 (not letters).

Required regex is

[0-9]{2}\.[\w]+

Learn more about regexes here. Once you learn about regexes in general, you can apply to any language of your choice.

5 Comments

I don't think these are right. First, the first doesn't match all alpha characters. And what is the purpose of the \b? And the \w in []? (I don't think that \w is actually defined in []; it's normally defined as [^_[:alnum:]], which isn't legal in [].)
I've ignored alpha chars as the OP specifically mentioned letter. \w is indeed allowed inside [] though its not POSIX standard. And \b I agree is redundant in this case. I edited out \b
Could you point out where \w is allowed inside [] (and in which version of regular expressions---there are so many). C++11 doesn't seem clear: it's either forbidden (undefined behavior, because not specified), or (my reading, although I'm not at all sure that this was the intention) the equivalent of [[_[:alnum:]]], which, when followed by a +, will match a [, a _ or an alnum, followed by one or more ]. (Given that the OP says "any string", and not a symbol, .* is what is wanted, anyway.)
Also: à is a letter, but it won't be matched by [a-zA-Z].
\w is already a character class. It doesn't need to be wrapped in one. In this case, it's wrong, though -- you should be using .* since it's supposed to match any string, not just words.
1

If you just want to know if a string matches one or the other, but you don't care which one it matches, you can use:

"(?:(?:[a-zA-Z])|(?:[0-9]{2}))\..*"

Using C++11 regex and ECMAScript syntax.

Comments

1
#include <regex>

std::string str = "OnlyLetter,12345";

std::string x = "([a-z]|[A-Z])+";
std::string y = "[0-9]+";
std::string expression = std::string(x).append(",").append(y);
std::tr1::regex rx(expression);
bool match = std::tr1::regex_match(str.c_str(),rx);
// match = true. Valid String
// match = false. Invalid String. ex.: "OnlyLetter,12s345"

Comments

0

It depends on which regular expression library you're using. But the following should work with both Boost and C++11:

For X.anystring (X is alpha):

"[[:alpha:]]\\..*"

For XY.anystring:

"[[:digit:]][[:digit:]]\\..*"

These are for use with regex_match; if you want to use regex_search, you'll have to "anchor" the expression to the beginning of the string by prefixing it with a '^' (but you can drop the final '.*').

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.