1

I have tried many regex code to extract the date from the emails that has this format but I couldn't:

Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)
Sent: Thursday, November 08, 2001 10:25 AM

This how it looks like in all emails and I want to extract them both.

Thank you in advance

1
  • 5
    What did you try and what doesn't work? Commented Apr 24, 2017 at 3:36

4 Answers 4

1

You can do something like this using this kind of pattern:

Using Python3:

import re
data = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
final = re.findall(r"Date: (\w+), ([0-9]+) (\w+) ([0-9]+)", data)
print("{0}, {1}".format(final[0][0], " ".join(final[0][1:])))
print(" ".join(final[0][1:]))

Using Python2:

import re
data = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
final = re.findall(r"Date: (\w+), ([0-9]+) (\w+) ([0-9]+)", data)
print "%s, %s" % (final[0][0], " ".join(final[0][1:]))
print " ".join(final[0][1:])

Output:

Tue, 13 Nov 2001
13 Nov 2001

Edit:

A quick answer to the new update of your question, you can do something like this:

import re 

email = '''Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)
Sent: Thursday, November 08, 2001 10:25 AM'''
data = email.split("\n")

pattern = r"(\w+: \w+, [0-9]+ \w+ [0-9]+)|(\w+: \w+, \w+ [0-9]+, [0-9]+)"

final = []
for k in data:
    final += re.findall(pattern, k)

final = [j.split(":") for k in final for j in k if j != '']
# Python3 
print(final)
# Python2
# print final

Output:

[['Date', ' Tue, 13 Nov 2001'], ['Sent', ' Thursday, November 08, 2001']]
Sign up to request clarification or add additional context in comments.

3 Comments

also, I want to extract this date in the body of the email Sent: Thursday, November 08, 2001 10:25 AM how can I extract them both?
Please edit your question and post an example of your email data and i'll edit my answer to fill your needs.
I've updated my answer. Look now and return your feedbacks if you have any.
0
import re

my_email = 'Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)'

match = re.search(r': (\w{3,3}, \d{2,2} \w{3,3} \d{4,4})', my_email)
print(match.group(1))

1 Comment

Whilst this code snippet is welcome, and may provide some help, it would be greatly improved if it included an explanation of how it addresses the question. Without that, your answer has much less educational value - remember that you are answering the question for readers in the future, not just the person asking now! Please edit your answer to add explanation, and give an indication of what limitations and assumptions apply.
0

I am not regex expert but here is a solution, you can write some tests for that

d = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
dates = [re.search('(\d+ \w+ \d+)',date).groups()[0] for date in re.search('(Date: \w+, \d+ \w+ \d+)', d).groups()]

['13 Nov 2001']

Comments

0

Instead of using regex, you can use split() if only extract the same string model:

email_date = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
email_date = '%s %s %s %s' % (tuple(email_date.split(' ')[1:5]))

Output:

Tue, 13 Nov 2001

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.