2

I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:

r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"

But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.

import sys
import re

word=[]

dict={}

f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')

data = f.read()
word = data.split() # word is list

f.close()

for num2 in word:
    match2 = re.findall("\w*(ab|ba)\w*", num2)
    if match2:
        dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1

for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())

Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...

2
  • 5
    Teachers should stop saying that regular expressions are a solution to every problem known to mankind... Commented Mar 27, 2016 at 8:47
  • @KemyLand: This should be the accepted answer :) Commented Mar 27, 2016 at 9:54

5 Answers 5

2

To match all the words with ab or ba (case insensitive):

import re

text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)

# to print all the matches
for match in pattern.finditer(text):
  print match.group(0)

# to print the first match
print pattern.search(text).group(0)

https://regex101.com/r/uH3xM9/1

Sign up to request clarification or add additional context in comments.

4 Comments

Is this case insensitive or sensitive ?...It will not match 'Ablotion' !...To make it case-insensitive, add re.IGNORECASE flag.
I tried your one, but it still matches like these. . still includes punctuation and special characters. e.x. "abandoned: 1 "indispensable: 1 "probably: 1 "unable: 1 (halfback: 1 2-baser,: 1banker.: 1 bankers: 2 bankers,: 1 bankers.:
is there any way to print group 1, in re.search method?
re.search will return only the first result. Is it what you want?
1

Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...

sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]

for word in words:
    word = word.lower()
    if 'ab' in word or 'ba' in word:
        print('Word "{}" matches pattern!'.format(word))

As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.

2 Comments

Your words is a list of chars, probably you wanted in your Comprehension list sentence.split() instead?
@IronFist: I tested the code before posting, but forgot that while writing the answer. Thanks for noticing!
1

I would do it this way:

  1. Strip your string from unwanted chars using the below two techniques, your choice:

    a - By building a translation dictionary and using translate method:

    >>> import string
    >>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
    s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
    >>> s = s.translate(del_punc)
    >>> print(s)
    'abolition fabrics probable test case bank halfback 1ablution'
    

    b - using re.sub method:

    >>> import string
    >>> import re
    >>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
    >>> s = re.sub(r'[%s]'%string.punctuation, '', s)
    >>> print(s)
    'abolition fabrics probable test case bank halfback 1ablution'
    
  2. Next will be finding your words containing 'ab' or 'ba':

    a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:

    >>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
    ['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
    

    b -Using re.finditer method:

    >>> pat
    re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
    >>> for m in pat.finditer(s):
            print(m.group())
    
    
    abolition
    fabrics
    probable
    test case bank
    halfback
    1ablution
    

Comments

0
string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
    print(true)
else:
    print(false)

Comments

0

Try this one

[(),/]*([a-z]|(ba|ab))+[(),/]*

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.