2

In a list I need to match specific instances, except for a specific combination of strings:

let's say I have a list of strings like the following:

l = [
'PSSTFRPPLYO',
'BNTETNTT',
'DE52 5055 0020 0005 9287 29',
'210-0601001-41',
'BSABESBBXXX',
'COMMERZBANK'
]

I need to match all the words that points to a swift / bic code, this code has the following form: 6 letters followed by 2 letters/digits followed by 3 optional letters / digits

hence I have written the following regex to match such specific pattern

import re
regex = re.compile(r'(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)')
for item in l:
    match = regex.search(item)
    if match:
        print('found a match, the matched string {} the match {}'.format( item, item[match.start() : match.end()]
    else:
        print('found no match in {}'.format(item)

I need the following cases to be macthed:

result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX' ]

rather I get

result = ['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX', 'COMMERZBANK' ]

so what I need is to match only the strings that don't contain the word 'bank'

to do so I have refined my regex to :

regex = re.compile((?<!bank/i)(?<!\w)[a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?(?!\w)(?!bank/i))

simply I have used negative look behind and ahead for more information about theses two concepts refer to link

My regex doesn't do the filtration intended to do, what did I miss?

6
  • 2
    (?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2})(?:[a-z0-9]{3})?$ with i modifier? Commented Oct 24, 2017 at 13:39
  • @ctwheels that's a nice trick thanks a lot. Commented Oct 24, 2017 at 13:47
  • 2
    My previous regex can actually be shortened to (?!.*bank.*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$ (4 less characters and 2 less steps) or (?![a-z0-9]*bank[a-z0-9]*)^[a-z]{6}(?:[a-z0-9]{2}|[a-z0-9]{5})$ (more characters, but almost 400 less steps) Commented Oct 24, 2017 at 13:57
  • 1
    @ctwheels What did you use to analyze the number of steps? Commented Oct 24, 2017 at 13:59
  • 2
    regex101 Commented Oct 24, 2017 at 14:02

1 Answer 1

2

You can try this:

import re
final_vals = [i for i in l if re.findall('^[a-zA-Z]{6}\w{2}|(^[a-zA-Z]{6}\w{2}\w{3})', i) and not re.findall('BANK', i, re.IGNORECASE)]

Output:

['PSSTFRPPLYO', 'BNTETNTT', 'BSABESBBXXX']
Sign up to request clarification or add additional context in comments.

1 Comment

Is there no way to include the condition in the regex? instead of traversion the string twice, don't get me wrong your solution does the job, thanks a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.