0

I am trying to make more use of regEx in my search engine. Please take a look :

someStr = "Processor AMD Athlon II X4 651K BOX Black Edition, s. FM1, 3.0GHz, 4MB cache, Quad Core"

# THIS SHOULD MATCH / processors-plural with 0 to 1,
# mega or mb should be the same
# and quad with 0 to 2 of any characters except whitespace
queryListTrue = ["processors", "amd", "4mega", "quaddy"]

# THIS SHOULDN'T MATCH / bad last item length
queryListFalse = ["processors", "amd", "4mb", "quaddie"]

# TO DESCRIBE WHAT I NEED
rulesList = [ r'processor[i.e. 0-1 char]', r'amd',
            r'4mega or 4mb', r'quad[from 0 to 2 any char]' ]

if ALL queryListTrue MATCHES someStr THRU rulesList : 
        print "What a wonderful world!"

Any help would be wonderful.

2
  • 1
    Using regex to do black magic? Commented Apr 23, 2014 at 7:16
  • I already made it by simple comparing and I thought that regex is more powerful!? ...but forgot about loop, that's easy part but list of expressions bothers me. Commented Apr 23, 2014 at 7:23

1 Answer 1

2

The regular expression for "[from 0 to 1 any char]" is simply

.?

i.e. dot . matches any character (except newline, by default) and the ? quantifier means the preceding expression is optional.

Note that processor.? will also match a space after processor or an arbitrary character such as processord. You probably intend processors? where the plural s is optional, or perhaps processor[a-z]? to constrain the optional last character to an alphabetic character.

Similarly, the generalized quantifier {m,n} specifies "at least m repetitions and at most n repetitions", so your "[from 0 to 2 any char]" translated to regex is .{0,2}.

Alternation in regular expressions is specified with | so mega|mb is the regex formulation for your "mega or mb". If you use the alternation in a longer context where some of the text is not subject to alternation, you need to add parentheses to scope the alternation, like m(ega|b).

In Python (like in most modern Perl-derived regex dialects), you can use (?: instead of ( if the grouping behavior of regular parentheses is undesired.

Sign up to request clarification or add additional context in comments.

1 Comment

Upvote for this excellent explanation and also accepted. Besides I edited mb/mega to 4mb/4mega to be more clear, but I got the picture. Much obliged.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.