4

How would you match locations (Places) with regular expressions in python. It should match locations of the following format:

  • London, ENG, United Kingdom
  • Melbourne, VIC, Australia
  • Palo Alto, CA USA

I've tried this but it doesnt work:

re.findall(r'([A-Z][a-z]+ ([A-Z][a-z]+)?,)+',x)

EDIT:

okay, let me make it clear what i want. i have a huge wall of text. i need to detect locations like the above mentioned from the text. not validate.

Example:

text = """
..............................
..............................
London, ENG, United Kingdom...
..............................
"""
re.findall(r'<something>',x)
#['London, ENG, United Kingdom']

it should be able to match any location of the format Xxxx, XXX, Xxxx with optional commas and optionally multiple words

0

3 Answers 3

2

How about using re.split?

'London, ENG, United Kingdom or Melbourne, VIC, Australia or Palo Alto, CA USA'
>>> list(map(str.strip, re.split(',|or', x)))
['London', 'ENG', 'United Kingdom', 'Melbourne', 'VIC', 'Australia', 'Palo Alto', 'CA USA']
>>> list(map(str.strip, re.split('or', x)))
['London, ENG, United Kingdom', 'Melbourne, VIC, Australia', 'Palo Alto, CA USA']

If you want location to be splitted with or, you don't need to use regular expression. Just use str.split:

>>> list(map(str.strip, x.split('or')))
['London, ENG, United Kingdom', 'Melbourne, VIC, Australia', 'Palo Alto, CA USA']
  • list is not needed if you use Python 2.x.

UPDATE

>>> x = 'London, ENG, United Kingdom / Melbourne, VIC, Australia / Palo Alto, CA USA'
>>> re.findall(r'(?:\w+(?:\s+\w+)*,\s)+(?:\w+(?:\s\w+)*)', x)
['London, ENG, United Kingdom', 'Melbourne, VIC, Australia', 'Palo Alto, CA USA']
Sign up to request clarification or add additional context in comments.

9 Comments

the "or" was just to separate the examples..!
those were just examples i gave. i need to match locations with that format in a huge wall of text..
@AnshumanDwibhashi, not perfect... You need to remove unnecessary word like or yourself.
or is just to "separate" the example i gave!!
hey, there's a problem: x ="Melbourne, VIC, Australia London, ENG, United Kingdom" when i run your code, it returns: ['Melbourne, VIC, Australia London', 'ENG, United Kingdom'] instead of ['Melbourne, VIC, Australia', 'London, 'ENG, United Kingdom']
|
1

There is no reason to use (expensive) regex when you can do it much more efficiently using a dictionary:

locations = {"London, ENG, United Kingdom":True, "Melbourne, VIC, Australia":True...}

The it's easy to use locations to see if x is one of them.

Update (after the edit):
Still, there's no need to use (expensive) regex since you're not doing any kind of pattern matching. You're preforming a simple substring search so use:

"London, ENG, United Kingdom" in text

or, in more general way, create a list of locations:

locations = ["London, ENG, United Kingdom", "Melbourne, VIC, Australia",...]
...
for location in locations:
    for location in text:
        # do what you want here

4 Comments

i dont have to match only london or melbourne or palo alto, any location with the format i've exampled...
@AnshumanDwibhashi right, so create a list of locations and do: for location in locations: ... if location in text: ...
i dont want to match a certain list of locations.. i want to match any location with the format i've specified after the example.. check my update
@AnshumanDwibhashi see update. Point is still valid - you don't do any kind of pattern matching hence - no need for regex!
1

Okay, i found my answer myself, its fairly simple:

r'\w+, \w+, \w+'

But to respect @falsetru's efforts i'll accept his answer.. Thankyou @falsetru

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.