4

How can I find as many date patterns as possible from a text file by python? The date pattern is defined as:

dd mmm yyyy
  ^   ^
  |   |
  +---+--- spaces

where:

  • dd is a two digit number
  • mmm is three-character English month name (e.g. Jan, Mar, Dec)
  • yyyy is four digit year
  • there are two spaces as separators

Thanks!

2
  • I'm not following you. Are you looking to grep for patterns of dates or for the dates according to a fixed single pattern? Commented May 5, 2010 at 1:30
  • I want to extract the actual dates. Commented May 5, 2010 at 2:17

5 Answers 5

10

Here's a way to find all dates matching your pattern

re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}', text)

But after WilhelmTell's comment on your question, I'm also wondering whether this it what you were really asking for...

Sign up to request clarification or add additional context in comments.

Comments

9

Use the calendar module to give you a little global awareness:

date_expr = r"\d{2} (?:%s) \d{4}" % '|'.join(calendar.month_abbr[1:])
print date_expr
print re.findall(date_expr, source_text)

For me, this creates a date_expr like:

"\d{2} (:?Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4}"

But if I change my locale using the locale module:

locale.setlocale(0, "fr")

I now search for months in French:

"\d{2} (?:janv.|févr.|mars|avr.|mai|juin|juil.|août|sept.|oct.|nov.|déc.) \d{4}"

Hmm, this is the first time I ever tried French month abbreviations, I may need to do some cleanup:

date_expr = r"\d{2} (?:%s) \d{4}" % '|'.join(
    m.title().rstrip('.') for m in calendar.month_abbr[1:])

Now I get:

"\d{2} (?:Janv|Févr|Mars|Avr|Mai|Juin|Juil|Août|Sept|Oct|Nov|Déc) \d{4}"

And now my script will run for my Gallic friends as well, with really very little trouble.

(You may wonder why I had to slice the month_abbr list from [1:] - this list begins with an empty string in position 0, so that if you use find() to look up a particular month abbreviation, you will get back a number from 1-12, instead of from 0-11.)

-- Paul

1 Comment

This is why I prefer to use the RE to validate the basic format (day month-abbrev year) and then let strptime take care of the localization of the month. If you are really interested, you can use some of the locale-aware options to account for differences in M-D-Y ordering as well.
5

Here's a slightly more complete example. The regexp will match more than just valid date value. datetime.strptime will fail to parse anything that is not valid and raise a ValueError. If the date is parsed, then you have a full datetime object that gives you access to a lot of functionality.

>>> from datetime import datetime
>>> import re
>>> dates = []
>>> patn = re.compile(r'\d{2} \w{3} \d{4}')
>>> fh = open('inputfile')
>>> for line in fh:
...   for match in patn.findall(line):
...     try:
...       val = datetime.strptime(match, '%d %b %Y')
...       dates.append(val)
...     except ValueError:
...       pass # ignore, this isn't a date
...

I imagine that this can be collapsed into nice tight code with comprehensions if you are so inclined.

1 Comment

appreciated! how can I concat 'val's into an array in python?
0

Try this:

import re

allmatches = re.findall(r'\d\d \w\w\w \d\d\d\d', "string to match")

2 Comments

Seriously? -1? Any reason other than '\w\w\w' probably isn't that great of a way to match a month? It IS what the guy asked for in his 'dd mmm yyyy' syntax. While it's not ideal, I don't understand the downvote.
Hi, though very late it may give any three alphanumeric characters which can be very random? correct me if I am wrong
0

or you can use this for completelly

date = re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}\s\d{2}:\d{2}', text)
print date
['30 November 2010 14:20', '30 November 2010 14:24']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.