I have a string as follows: 2020-01-01T16:30.00 - 1.00. I want to select the string that is between T and - , i.e. I want to be able to select 16:30.00 out of the whole string and convert it to a float. Any help is appreciated.
1 Answer
If you have a pandas Series s like this
import pandas as pd
s = pd.Series(["2020-01-01T16:30.00 - 1.00", "2020-12-04T00:25.00 - 14.00"])
you can use
s.str.replace(".+T", "").str.replace(" -.+", "")
# 0 16:30.00
# 1 00:25.00
# dtype: object
Basically, you first substitute with an empty string everything that precedes the T and the T itself. Then, you substitute with an empty string the part starting with - (there is a whitespace before the small dash).
Another option is to use groups of regular expressions to match particular patterns and select only one of the groups (in this case the second, .+)
import re
s.apply(lambda x: re.match("(.+T)(.+)( -.+)", x).group(2))
# 0 16:30.00
# 1 00:25.00
# dtype: object
4 Comments
S_Scouse
Thank you, using datetime library is another way to do it. I found it in one of the stackoverflow answers.
Ric S
Seen, very useful link!
Ric S
@S_Scouse Just added another solution if you want to check it out
S_Scouse
Thank you, very useful. I might use it for some other string selection need.
datetimetype for date/time data.