python regex matching "ab" or "ba" words

Question

I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:

r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"

But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.

import sys
import re

word=[]

dict={}

f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')

data = f.read()
word = data.split() # word is list

f.close()

for num2 in word:
    match2 = re.findall("\w*(ab|ba)\w*", num2)
    if match2:
        dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1

for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())

Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...

Teachers should stop saying that regular expressions are a solution to every problem known to mankind... — 3442
– 3442, Commented Mar 27, 2016 at 8:47

Florent B. · Accepted Answer · 2016-03-27 11:15:32Z

2

To match all the words with ab or ba (case insensitive):

import re

text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)

# to print all the matches
for match in pattern.finditer(text):
  print match.group(0)

# to print the first match
print pattern.search(text).group(0)

https://regex101.com/r/uH3xM9/1

edited Mar 27, 2016 at 11:15

answered Mar 27, 2016 at 8:57

Florent B.

42.7k7 gold badges92 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Iron Fist Over a year ago

Is this case insensitive or sensitive ?...It will not match 'Ablotion' !...To make it case-insensitive, add re.IGNORECASE flag.

Youngin Na Over a year ago

I tried your one, but it still matches like these. . still includes punctuation and special characters. e.x. "abandoned: 1 "indispensable: 1 "probably: 1 "unable: 1 (halfback: 1 2-baser,: 1banker.: 1 bankers: 2 bankers,: 1 bankers.:

Youngin Na Over a year ago

is there any way to print group 1, in re.search method?

Florent B. Over a year ago

re.search will return only the first result. Is it what you want?

3442 · Accepted Answer · 2016-03-27 09:03:20Z

1

Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...

sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]

for word in words:
    word = word.lower()
    if 'ab' in word or 'ba' in word:
        print('Word "{}" matches pattern!'.format(word))

As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.

edited Mar 27, 2016 at 9:03

answered Mar 27, 2016 at 8:59

3442

8,6562 gold badges23 silver badges43 bronze badges

2 Comments

Iron Fist Over a year ago

Your words is a list of chars, probably you wanted in your Comprehension list sentence.split() instead?

3442 Over a year ago

@IronFist: I tested the code before posting, but forgot that while writing the answer. Thanks for noticing!

Iron Fist · Accepted Answer · 2016-03-27 10:34:43Z

I would do it this way:

Strip your string from unwanted chars using the below two techniques, your choice:

a - By building a translation dictionary and using translate method:

>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

b - using re.sub method:

>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

Next will be finding your words containing 'ab' or 'ba':

a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:

>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']

b -Using re.finditer method:

>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
        print(m.group())


abolition
fabrics
probable
test case bank
halfback
1ablution

BallpointBen · Accepted Answer · 2016-03-27 08:57:09Z

0

string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
    print(true)
else:
    print(false)

answered Mar 27, 2016 at 8:57

BallpointBen

15.6k2 gold badges46 silver badges81 bronze badges

Comments

kvorobiev · Accepted Answer · 2016-03-27 09:07:17Z

0

Try this one

[(),/]*([a-z]|(ba|ab))+[(),/]*

edited Mar 27, 2016 at 9:07

kvorobiev

5,0704 gold badges32 silver badges36 bronze badges

answered Mar 27, 2016 at 9:02

marsouf

1

Collectives™ on Stack Overflow

python regex matching "ab" or "ba" words

5 Answers 5

4 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related