0

I am working on a Python program that searches through received emails and returns coordinates. I am trying to create a regular expression to select the Lat/long values from a string. (I am new to regex)

Here is a small example of one of the strings I have been using for testing:

     content = """

WorkLocationBoundingBox
Latitude:30.556555Longitude:-97.659824
SecondLatitude:30.569138SecondLongitude:-97.650855

     """

I came up with Latitude:(\d+).(\d+)Longitude:(.*), which I believe is close to what I need, but it sperates 30 and 556555 into seperate groups. But, -97.659824 is correctly placed into a group.

My ideal expected result would look something this:

[(30.556555, -97.659824, 30.569138, -97.650855)]
2
  • 1
    Try it like this Latitude:(\d+(?:\.\d+)?)Longitude:(.*) or more precise (?:Second)?Latitude:(-?\d+(?:\.\d+)?)(?:Second)?Longitude:(-?\d+(?:\.\d+)?) See regex101.com/r/OZgPXb/1 Commented Jun 4, 2021 at 14:12
  • Worked great, now to spend the time to figure out why! Thanks for your help! Commented Jun 4, 2021 at 14:29

1 Answer 1

1

You can use 3 capture groups, where the first group is used to match up the word before Long or Latitude.

((?:Second)?)Latitude:(-?\d+(?:\.\d+)?)\1Longitude:(-?\d+(?:\.\d+)?)
  • ((?:Second)?) Capture group 1, optionally match Second
  • Latitude: Match literally
  • (-?\d+(?:\.\d+)?) Capture group 2, match an optional - then 1+ digits with an optional decimal part
  • \1Longitude: A Backreference to what is matched in group 1 and match Longitude:
  • (-?\d+(?:\.\d+)?) Capture group 3, match an optional - then 1+ digits with an optional decimal part

Regex demo or a Python demo

import re
regex = r"((?:Second)?)Latitude:(-?\d+(?:\.\d+)?)\1Longitude:(-?\d+(?:\.\d+)?)"
s = ("WorkLocationBoundingBox\n"
            "Latitude:30.556555Longitude:-97.659824\n"
            "SecondLatitude:30.569138SecondLongitude:-97.650855")

matches = re.finditer(regex, s)
lst = []

for matchNum, match in enumerate(matches, start=1):
     lst.append(match.group(2))
     lst.append(match.group(3))

print(lst)

Output

['30.556555', '-97.659824', '30.569138', '-97.650855']

A bit less strict pattern could be matching optional word character before either Longitude or Latitude:

\w*Latitude:(-?\d+(?:\.\d+)?)\w*Longitude:(-?\d+(?:\.\d+)?)

Regex demo

In that case, you might also use re.findall to return the group values in a list of tuples if you want:

import re

pattern = r"\w*Latitude:(-?\d+(?:\.\d+)?)\w*Longitude:(-?\d+(?:\.\d+)?)"

s = ("WorkLocationBoundingBox\n"
            "Latitude:30.556555Longitude:-97.659824\n"
            "SecondLatitude:30.569138SecondLongitude:-97.650855")
print(re.findall(pattern, s))

Output

[('30.556555', '-97.659824'), ('30.569138', '-97.650855')]
Sign up to request clarification or add additional context in comments.

1 Comment

A very thorough and helpful answer. Thanks again for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.