1

I have a column in my pandas dataframe called last_pymnt which has dates in the format of 17-Mar, 13-Dec, etc. doing a string replace will be too tedious since there are so many unique dates so I tried to create a dictionary to replace wherever we see the month name with an integer however it does not seem to work. This is what I have.

integers = {'-Jan': 1, '-Feb': 2, '-Mar': 3, '-Apr': 4, '-May': 5, '-Jun': 6, '-Jul': 7, '-Aug': 8, 
'-Sep': 9, '-Oct': 10, '-Nov': 11, '-Dec': 12,}

data.replace({'-Jan': integers, '-Feb': integers, '-Mar': integers, '-Apr': integers, '-May': 
integers, '-Jun': integers, '-Jul': integers, '-Aug': integers, '-Sep': integers, '-Oct': integers, 
'-Nov': integers, '-Dec': integers})

The output was suppose to go throughout the entire dateframe and replace the partial matches with an integer so after running the code the date of 17-Mar should have given the output 173 but I still get the result of 17-Mar

2 Answers 2

1

IICU I would avoid handling dates and datetimes otherwise.

For instance;

Data

df=pd.DataFrame({'last_pymnt':['17-Mar', '12-Dec']})
df

I would go;

df['last_pymnt'] = pd.to_datetime(df['last_pymnt'], format='%d-%b').dt.strftime('%m-%d')
df

If isnt working for what you want try

df=pd.DataFrame({'last_pymnt':['17-Mar', '12-Dec']})
df.last_pymnt=df.last_pymnt.str.replace('-','')
df['last_pymnt'] = pd.to_datetime(df['last_pymnt'], format='%d%b').dt.strftime('%d%m')

Output

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

You can do this with regular expressions.
The parentheses/brackets around \d+ make that a captured group which you then reference with \1 in the substitution string.

import re

df = pd.DataFrame({'last_pymnt':['17-Mar','13-Dec']})
repl_dict = {re.compile(r'^(\d+)[-]Jan$'):r'\1 1', 
             re.compile(r'^(\d+)[-]Feb$'):r'\1 2', 
             re.compile(r'^(\d+)[-]Mar$'):r'\1 3', 
             re.compile(r'^(\d+)[-]Apr$'):r'\1 4', 
             re.compile(r'^(\d+)[-]May$'):r'\1 5', 
             re.compile(r'^(\d+)[-]Jun$'):r'\1 6', 
             re.compile(r'^(\d+)[-]Jul$'):r'\1 7', 
             re.compile(r'^(\d+)[-]Aug$'):r'\1 8', 
             re.compile(r'^(\d+)[-]Sep$'):r'\1 9', 
             re.compile(r'^(\d+)[-]Oct$'):r'\1 10', 
             re.compile(r'^(\d+)[-]Nov$'):r'\1 11', 
             re.compile(r'^(\d+)[-]Dec$'):r'\1 12',}  
df['last_pymnt_repl'] = df['last_pymnt'].replace(repl_dict,regex=True).str.replace('\s+','')

Result:

In [149]: df                                                                                        
Out[149]: 
  last_pymnt last_pymnt_repl
0     17-Mar             173
1     13-Dec            1312

1 Comment

Thank you but this code only worked for 17-Mar but for the other dates I get NAN. For example I also have dates such as 16-Aug, 16-Jun, 17-Oct, 18-May, etc but instead of replacing the 16-Aug to 168 it now says NAN for every other value except 17-Mar and 13-Dec

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.