2

I need to grab specific details being parsed in from email bodies, in this case the emails are plain text and formatted like so:

[email protected]
John Doe
+16073948374
2021-04-27T15:38:11+0000
14904

The above is an example output of print(body) parsed in from an email like so:

def parseEmail(popServer, msgNum):
    raw_message=popServer.retr(msgNum)[1]
    str_message=email.message_from_bytes(b'\n'.join(raw_message))
    body=str(str_message.get_payload())

So, if I needed to simply grab the email address and phone number from body object, how might I do that using regex?

I understand regex is most certainly overkill for this, however I'm only repurposing an existing in-house utility that's already written to utilize regex for more complex queries, so it seems the simplest solution here would to modify the regex to grab the desired text. attempts to utilize str.partition() resulted in other unrelated errors.

Thank you in advance.

6
  • Are you trying to get an e-mail address from the headers of the e-mail, or from the body of the e-mail (which I assume is what you have shown above)? Commented Apr 29, 2021 at 17:46
  • A simple RE like \S+@\S+\.\S+ is crude, but should capture the e-mail address. Commented Apr 29, 2021 at 17:48
  • everything is parsed in from the body of the email, the example above is literally the contents of the email message. I need to assign first and third lines to new variables/objects to be used elsewhere in the program Commented Apr 29, 2021 at 17:49
  • Is it always on the first and third line, or is the format variable? Commented Apr 29, 2021 at 17:51
  • @joanis always first and third line Commented Apr 29, 2021 at 17:53

4 Answers 4

3

You could use the following regex patterns:

For the email: \.+@.+\n/g

For the phone number: \^[+]\d+\n/gm

Remove the Initial forward slash if using in python re library.

Note in the email one only the global flag is used, but for the phone number pattern, the multiline flag is also used.

Simply loop over every body, capturing these details and storing them how you like.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you- and this can be used with re.search()? or does it have to be re.match()? Excuse my ignorance if that's a dumb question but I'm not so versed in regex
Since it's a multiline string, you'd be using re.search().
2

In the comments clarifying the question, you indicated that the e-mail address is always on the first line, and the phone number is always on the 3rd line. In that case, I would just split the lines instead of trying to match them with an RE.

lines = body.split("\n")
email = lines[0]
phone = lines[2]

2 Comments

I ended up using your solution, because well, duh. Thank you so much. From a practical standpoint, your answer is the obvious choice... however I need to give the answer to BrickleRex, as his solution worked for the question I specifically asked using regex (as senseless as using regex for this might be).
All good! Regexes are a powerful tool, one of my favourites, and I'm glad someone else answered the question as asked too.
1

To match those patterns on the 1st and the 3rd line you can use 2 capture groups using a single regex:

^([^\s@]+@[^\s@]+)\r?\n.*\r?\n(\+\d+)$

The pattern matches:

  • ^ Start of string
  • ([^\s@]+@[^\s@]+) Capture an email like pattern in group 1 (Just a single @ on the first line)
  • \r?\n.*\r?\n Match (do not capture) the second line
  • (\+\d+) Capture a + and 1+ digits in group 2
  • $ End of string

Regex demo

Example

import re

regex = r"^([^\s@]+@[^\s@]+)\r?\n.*\r?\n(\+\d+)$"

s = ("[email protected]\n"
     "John Doe\n"
     "+16073948374\n"
     "2021-04-27T15:38:11+0000\n"
     "14904")

match = re.match(regex, s, re.MULTILINE)

if match:
        print(f"{match.group(1)}, {match.group(2)}")

Output

[email protected], +16073948374

Comments

1

Using Regex.

Ex:

import re

s = """[email protected]
John Doe
+16073948374
2021-04-27T15:38:11+0000
14904"""

ptrn = re.compile(r"(\w+@\w+\.[a-z]+|\+\d{11}\b)")
print(ptrn.findall(s)) 

Output:

['[email protected]', '+16073948374']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.