21

I have a string with variable length and I want to give a format to strptime in order for the rest of the string to be ignored. Let me exemplify. I have something like

9/4/2013,00:00:00,7.8,7.4,9.53
10/4/2013,00:00:00,8.64,7.4,9.53

and I want a format that makes the command strptime(line,format) work to read those lines. Something like format='%d/%m/%Y,%H:%M:%S*', although I know that doesn't work. I guess my question is kind of similar to this one, but no answer there could help me and my problem is a little worse because the full length of my string can vary. I have a feeling that dateutil could solve my problem, but I can't find something there that does the trick.

I can probably do something like strptime(''.join(line.split(',')[:2]),format), but I wouldn't want to resort to that for user-related issues.

1
  • 5
    This boils down to an enhance request on strptime to allow arbitrary regexes, at least in the trailing part of string: format='%d/%m/%Y,%H:%M:%S.*'. This is a common request and well worth considering. In fact people have been asking for it for 13+ years. Commented Nov 16, 2017 at 19:46

4 Answers 4

25

You cannot have datetime.strptime() ignore part of the input.; your only option really is to split off the extra text first.

So yes, you do have to split and rejoin your string:

format = '%d/%m/%Y,%H:%M:%S'
datetime.strptime(','.join(line.split(',', 2)[:2]), format)

or find some other means to extract the information. You could use a regular expression, for example:

datetime_pattern = re.compile(r'(\d{1,2}/\d{1,2}/\d{4},\d{2}:\d{2}:\d{2})')
format = '%d/%m/%Y,%H:%M:%S'
datetime.strptime(datetime_pattern.search(line).group(), format)
Sign up to request clarification or add additional context in comments.

7 Comments

Yes, one way or another, you have to modify each input line, or modify the format for each input line. Too bad.
This is pretty terrible (and is what I am doing) because you have to define two equivalent but differently specified date patterns.
@JeffreyBlattman: Why do you need to define different date patterns? Extract just the date portion and pass that to datetime.strptime().
In your answer, see datetime_pattern and format. That's two different patterns.
@JeffreyBlattman: The regex depends on the source line and is an example specific to this case. Given that the OP's data looks like CSV, splitting on commas looks more applicable here.
|
2

To build a format string without splitting the time string and discarding extra text, just include the extra text in the format string. t[t.index(',',t.index(',') + 1):] is extra text.

from datetime import datetime
l = ['9/4/2013,00:00:00,7.8,7.4,9.53', '10/4/2013,00:00:00,8.64,7.4,9.53']
for t in l:
    print datetime.strptime(t,'%d/%m/%Y,%H:%M:%S'+t[t.index(',',t.index(',')+1):])

If the string has '%' can be replaced by empty string.

l = ['9/4/2013,00:00:00,7.8,7.4,9.53', '10/4/2013,00:00:00,8.64,7.4,9.53']
for t in l:
    t = t.replace('%','')
    fmt = '%d/%m/%Y,%H:%M:%S' + t[t.index(',',t.index(',')+1):]
    print datetime.strptime(t, fmt)

Or with string slicing and static format string,

for t in l:
        print datetime.strptime(t[:t.find(',',t.find(',')+1)],'%d/%m/%Y,%H:%M:%S')

2013-04-09 00:00:00
2013-04-10 00:00:00

5 Comments

So what happens if the extra string contains % characters? Note that you are essentially doing the split in reverse; you are splitting of the remainder and adding it to the format string.
chances of occurence of % in the date and time field hold true. I answered the OP.
Right, but so did I, yet it doesn't have the problems your "solution" has. Sometimes the answer really is you cannot do that, but here is how you solve the problem.
what problems? The OP needed a format string. He already knew splitting original string.
@NizamMohamed: it will be less performant (and also more error-prone and brittle) than simply doing the split into <timestamp><trailing_part>. That might even be just simple subscripting, for fixed-format lines and when zero-padded %d is used. No I didn't downvote. Yes I believe people who downvote should explain their thinking, it usually generates constructive improvement, or at least highlights misunderstandings.
2

Have a look at datetime-glob, a module we developed to parse date/times from a list of files. You can use datetime_glob.PatternSegment to parse arbitrary strings:

>>> import datetime_glob
>>> patseg = datetime_glob.parse_pattern_segment('%-d/%-m/%Y,%H:%M:%S*')
>>> match = datetime_glob.match_segment('9/4/2013,01:02:03,7.8,7.4,9.53',
                                        patseg)
>>> match.as_datetime()
datetime.datetime(2013, 4, 9, 1, 2, 3)

Comments

0

Using regexp too because python datetime does not allow to ignore char, this version use no-capturing group (sorry the example is not related to your question):

import datetime, re

date_re = re.compile(r'([^.]+)(?:\.[0-9]+) (\+[0-9]+)')
date_str = "2018-09-06 04:15:18.334232115 +0000"

date_str = " ".join(date_re.search(date_str).groups())

date_obj = datetime.datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S %z")

It's much better to use regexp like @marjin suggests, so your code is more comprehensible and easy to update.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.