0

The date in my data is stored in two different formats:

Dienstag 31. Dezember 2013 and 30. Juni 2007

I wrote scripts to extract Year/Month/Day from both formats and store them in a list:

for row in reader:
    line_count = line_count + 1
    if row[1] == "DATE":
        pass
    else:
        date = row[1].encode('utf-8')
        year = date.split('.')[1].split(" ")[2]
        day = date.split(" ")[0]
        day = day.replace('.', '')
        month = date.split('.')[1].split(' ')[1]

for the first format

and

date = row[1].encode('utf-8')
year = date.split('.')[1].split(" ")[2]
day = date.split(" ")[0]
day = day.replace('.', '')
month = date.split('.')[1].split(' ')[1]

for the second format

However these date formats are randomly occurring throughout the dataset (row[1]). Is there a way to tell Python when it encounters one of the formats to use the respective script (like an if statement)? Thanks.

5
  • Is the word Dienstag (or any other day in your language) always expected before the date? Commented Jun 11, 2015 at 11:48
  • Any other Weekday in German (Montag, Dienstag, Mittwoch, Donnerstag, Freitag, Samstag, Sonntag) Commented Jun 11, 2015 at 11:50
  • Done. Give me 2 minutes to test a couple of cases. Commented Jun 11, 2015 at 11:51
  • Well @Zlo do you need These dutch week day names or not? Commented Jun 11, 2015 at 11:52
  • Does your second pattern always start with number Commented Jun 11, 2015 at 11:52

4 Answers 4

2

If any only if the second pattern starts with a number

if (date[0].isdigit()):

      ***method for pattern2***
else:

      ***method for pattern1***
Sign up to request clarification or add additional context in comments.

1 Comment

@alexisdevarennes I don't think so we both had only 5 sec difference
2

Don't know if there's a compulsion on you but Regular Expressions are more suitable for a problem of this kind. The best part is, it is very robust yet flexible -> you can easily make modifications if you expect more formats (maybe American style like January 31, 2004). Five lines of code rather than original 15 ;)

Here's the code:

import re

reg_date = "(Montag|Dienstag|Mittwoch|Donnerstag|Freitag|Samstag|Sonntag)*\s*(\d{1,2})\.\s+(\w{3,12})\s(\d{2,4})"

def extract_date(string):
    results = re.search(reg_date, string)
    if results:
        date = results.groups()
        return date[1], date[2], date[3] 

And to use this, simply write a line like:

day,month,year = extract_date("Dienstag 31. Dezember 2013 and ")
print day,month,year

or another experiment with the second format

day,month,year = extract_date("31. May 2013 ")
print day,month,year

enter image description here

Simple, Elegant, Reusable.

Comments

1

You can check if the first character in the string is alpha.

if date[0].isalpha():
    # call your function for German dates here
else:
    # call the other function

Comments

0

Another approach with regex, just to give you more options:

import re

if (re.search('^[a-zA-Z]',date):
    #Method for First Format
else:
    #Method for Second Format

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.