3

Using python 3.9.5, I have this string

>>> t
' LICENSE INVALID\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n LICENSE INVALID\n Module AMS version2020.103\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n Module AMS version2020.103\nLICENSE INVALID\nLICENSE INVALID'

I want a re that will return None if either one of the strings 'LICENSE INVALID' or 'license does not include' is found; and, if those strings are both absent and both of the strings '2020.103' and 'NORMAL TERMINATION' are present, only then do I want it to return a match. (If nothing at all matches, return a None too.) So far I have

>>> p=re.compile(r'^(?!.*LICENSE INVALID|license does not include).*(?:2020.103|NORMAL TERMINATION).*')
>>> print(p.search(s))
<re.Match object; span=(0, 8), match='2020.103'>

This does the first part: it returns None if 'LICENSE INVALID' or 'license does not include' are in the text. However, I believe it is doing an exclusive "or" match on the latter two strings. I want it to do an "and". It matches above when I'd rather it did not. The output I'm matching on will likely contain '2020.103' both when there is a failure (when I do not want my re to find a match) or a success (when I want my re to find a match). I need to use a re for this to fit it in with someone else's code I'm using. To summarize: only if '2020.103' and 'NORMAL TERMINATION' are both found, and 'LICENSE INVALID' and 'license does not include' are not found, do not return a None.

3 Answers 3

2

I might avoid regex here and instead use the base string functions:

inp = [' LICENSE INVALID\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n LICENSE INVALID\n Module AMS version2020.103\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n Module AMS version2020.103\nLICENSE INVALID\nLICENSE INVALID', 'Hello 2020.103 is NORMAL TERMINATION']

for x in inp:
    if "LICENSE INVALID" not in x and "license does not include" not in x and "2020.103" in x and "NORMAL TERMINATION" in x:
        print("MATCH: " + x)
    else:
        print("NO MATCH: " + x)

Only the second sample input in the list is matching.

Sign up to request clarification or add additional context in comments.

Comments

2

With regex, would you please try:

p = re.compile(r'^(?!.*(?:LICENSE INVALID|license does not include)).*(?=.*2020.103)(?=.*NORMAL TERMINATION)')

It should match if '2020.103' and 'NORMAL TERMINATION' are both found, and any of 'LICENSE INVALID' and 'license does not include' are not found.

Comments

0

Let's clear out the requirements:

  • "return None if either one of the strings 'LICENSE INVALID' or 'license does not include' is found" means you want to fail the match if either of the two strings is present in the string. This is the case when we use negative lookaheads, (?!.*pattern1), or where we have two patterns, we use (?!.*pattern1)(?!.*pattern2) or (?!.*(?:pattern1|pattern2)), and so on.
  • "both of the strings '2020.103' and 'NORMAL TERMINATION' are present, only then do I want it to return a match. (If nothing at all matches, return a None too.)" means you need to make sure a string contains two patterns in any order in the string, and this is a case when we use ^(?=.*pattern1)(?=.*pattern2).

Moreover, you noticed the . in the patterns above. In Python re, the dot does not match line break chars by default, you need to use the re.S or re.DOTALL flag (or add (?s) at the pattern start to redefine the behavior of the . in the whole regex, or use (?s:.) instead of ..

So, let's combine the requirements - match a string that contains two patterns in any order but does not contain one pattern or another - into a regex:

p=re.compile(r'^(?!.*\b(?:LICENSE INVALID|license does not include)\b)(?=.*(?<!\d)2020\.103(?!\d))(?=.*\bNORMAL TERMINATION\b).*', re.S)

Details:

  • ^ - start of a string
  • (?!.*\b(?:LICENSE INVALID|license does not include)\b) - a negative lookahead failing the match if there is either LICENSE INVALID or license does not include as whole words anywhere after zero or more chars, as many as possible
  • (?=.*(?<!\d)2020\.103(?!\d)) - a positive lookahead requiring a 2020.103 that is not immediately preceded nor followed by a digit anywhere after zero or more chars, as many as possible
  • (?=.*\bNORMAL TERMINATION\b) - a positive lookahead requiring a NORMAL TERMINATION whole word anywhere after zero or more chars, as many as possible
  • .* - the rest of the string. Not necessary if you need a boolean result. See the Python demo:
import re
s = ' LICENSE INVALID\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n LICENSE INVALID\n Module AMS version2020.103\n Your license does not include module AMS version 2020.103 on this machine.\n Module AMS\n Module AMS version2020.103\nLICENSE INVALID\nLICENSE INVALID'
p=re.compile(r'^(?!.*\b(?:LICENSE INVALID|license does not include)\b)(?=.*(?<!\d)2020\.103(?!\d))(?=.*\bNORMAL TERMINATION\b).*', re.S)
print(p.search(s)) # NO MATCH, there is "LICENSE INVALID" and even "license does not include", there is no expected "NORMAL TERMINATION", although there is "2020.103"
print(p.search('NORMAL TERMINATION\nBlah-blah\n2020.103 code')) # MATCH, there are no "LICENSE INVALID" and "license does not include", there is "NORMAL TERMINATION" and "2020.103"
print(p.search('NORMAL TERMINATION\nBlah-blah\n2020.1030 code')) # NO MATCH, same as above but "2020.1030" is not  "2020.103"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.