0

I have written this function,

def my_func(s):
    wordlist = ('unit','room','lot')
    if if any(re.match(r'^'+ word + r'\b' + r'.*$', s.lower()) for word in wordlist) and any(i.isdigit() for i in s.lower())::
        if ',' in s:
            out = re.findall(r"(.*),", s) #Getting everything before comma
            return out[0]
        else:
            out = re.findall(r"([^\s]*\s[^\s]*)", s) #Getting everything before second space.
            return out[0]

My test data and the expected output

Unity 11 Lane. --> None
Unit 11 queen street --> Unit 11
Unit 7, king street --> Unit 7
Lot 12 --> Lot 12
Unit street --> None

My logic here is

  1. Take up to first comma, if there is ',' in the string.
  2. Take up to second space if there is no comma
  3. Dont bring out anything if the string is not starting with anything in the wordlist.
    1. Bring all if no second space or comma in it.

Everything else is working fine, how to capture Lot 12 here, say if the string matches wordlist and there is no ',' and no second space, then bring it all

2
  • Lot 12 --> Lot 12 and Unit street --> None are mutually exclusive if you want your rule to be Take up to first comma, if there is ',' in the string. and Take up to second space if there is no comma. street matches those conditions. Should those matches be only digits? Commented Jun 13, 2017 at 3:58
  • Yup, thats why I have added this in the first if condition - any(i.isdigit() for i in s.lower()) Commented Jun 13, 2017 at 3:59

1 Answer 1

1

You're overcomplicating this, it's a simple word + whitespace + digits match:

import re

def my_func(s):
    wordlist = ('unit', 'room', 'lot') 
    result = re.match(r"((?:{})\s+\d+)".format("|".join(wordlist)), s, re.IGNORECASE)
    if result:
        return result.group()

Let's test it:

test_data = ["Unity 11 Lane.",
             "Unit 11 queen street",
             "Unit 7, king street",
             "Lot 12",
             "Unit street"]

for entry in test_data:
    print("{} --> {}".format(entry, my_func(entry)))

Which gives:

Unity 11 Lane. --> None
Unit 11 queen street --> Unit 11
Unit 7, king street --> Unit 7
Lot 12 --> Lot 12
Unit street --> None

If you really want to match everything before a whitespace, a comma or EOL, you can do it by replacing the regex with:

result = re.match(r"((?:{})\s+.+?(?=\s|,|$))".format("|".join(wordlist)), s, re.IGNORECASE)

But this will match one of your undesired strings because the pattern cannot know that you like and but don't like street:

Unity 11 Lane. --> None
Unit 11 queen street --> Unit 11
Unit 7, king street --> Unit 7
Lot 12 --> Lot 12
Unit street --> Unit street
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thanks. The reason why I am going by comma and second space is because, I have some more scenarios to handle like Unit 7-12 queen street, unit 8 and 10, king street. Does this work for that as well?
No it doesn't but you if you look for a space, comma or end of line/string you will be matching Unit street as well.
Yeah I need to differentiate and handle them as well. May be using any(i.isdigit() for i in s.lower()) to differentiate them to different pathways and handle it with different regex ?
Check the update if you really want that pattern. But I'd urge you to first define absolutely every possible structure that you want to match, and every you don't - before that you cannot create rules on how to differentiate between strings. Also, let the regex engine do your bidding instead of you attempting to post-process the data - the regex engine works on the 'fast' C side (in CPython at least) and will be much better and faster at pattern matching than you can do it in post-processing.
Thanks. I agree with your point, but I cannot define every possible structure as I have millions of user inputs. :) Literally not possible to capture every structure, thanks anyways, I will try to use this regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.