1

I have the following dataframe:

                  Data
0           12/25/2020
1           10/25/2020
2  2020-09-12 00:00:00
3  2020-12-09 00:00:00

I'm using the following (python) code to extract the first two potential numbers to represent a month:

df['Data'].apply(lambda x: re.match('.*([1-2][0-9]{3})', x).group(1))

However, it returns a NaN dataframe. When i test it in regex101, it works (link: https://regex101.com/r/QpacQ0/1). So, i have two questions:

  • Is there a better way to work with dates from an user input? I mean, i'm building a script to recognize by position and then convert to a datetime object.
  • And second, why can't this code recognize the months?

1 Answer 1

2

You need to use

df['Month'] = df['Data'].str.extract(r'\b(0[1-9]|1[0-2])\b')

When using re.match('.*([1-2][0-9]{3})', x), you actually match any zero or more chars other than line break chars, as many as possible, from the start of string (since re.match only searches for a match at the start of string) and then capture 1 or 2 digit and then any three digits. So, you actually match the last occurrence of a specific 4-digit sequence, not a month-like number.

With .str.extract(r'\b(0[1-9]|1[0-2])\b'), you extract the first occurrence of 0 followed with a non-zero digit, or 1 followed with 0, 1 or 2, as whole word, due to \b word boundaries.

Here is the regex demo.

If the Data is not a string column, convert it into one:

df['Month'] = df['Data'].astype(str).str.extract(r'\b(0[1-9]|1[0-2])\b')
#                       ^^^^^^^^^^^^
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot for your answer, man! Do you know if is there a way to get the position?
I was trying to type: df['Data'].apply(lambda x: re.match('\b(0[1-9]|1[0-2])\b', x)). However, it keeps returning None. Do you know why?
@bellotto I suggest something like df['Data'].apply(lambda x: re.search(r'\b(0[1-9]|1[0-2])\b', x).start() if re.search(r'\b(0[1-9]|1[0-2])\b', x) else -1), or use a function where you would run re.search only once.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.