2

I am using pandas version 0.16.2. I want to extract Year and Month of the date columns.

I read the data

df = pd.read_csv(raw_data.csv,
        parse_dates=['EOM_DEFAULT_DATE','RESOLUTION_DATE'], low_memory=False)

'EOM_DEFAULT_DATE' looks like:

    0    31-JAN-07 12.00.00.000000000 AM
    1    31-JAN-07 12.00.00.000000000 AM
    Name: EOM_DEFAULT_DATE, dtype: object

'RESOLUTION DATE' looks like:

    0   2008-03-31
    1   2008-03-31
    Name: RESOLUTION_DATE, dtype: datetime64[ns]

Specifically, I want to extract Year this way, but get this error:

      df['YEAR']=pd.DatetimeIndex(df['RESOLUTION_DATE']).year

      --- 
      A value is trying to be set on a copy of a slice from a DataFrame.

      Try using .loc[row_indexer,col_indexer] = value instead

Also, I get an error when trying to extract month:

      df['MNTH']=pd.DatetimeIndex(df['EOM_DEFAULT_DATE']).month

      ---
      File "<ipython-input-61-d7aec9a17a8f>", line 1, in <module>

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 292, in __new__
yearfirst=yearfirst)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1936, in _str_to_dt_array
data = _algos.arrmap_object(arr, parser)

      File "pandas\src\generated.pyx", line 2295, in pandas.algos.arrmap_object (pandas\algos.c:77984)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1932, in parser
yearfirst=yearfirst)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\tools.py", line 494, in parse_time_string
raise DateParseError(e)

      DateParseError: unknown string format

Using this exact code, I know others can run the code fine, and extract year and month. What am I missing?

1 Answer 1

2

You can use a .dt accessor to get the year and month on a pd.Series whose values are datetime64.

df['YEAR'] = df['RESOLUTION_DATE'].dt.year 

To parse the date, you need to supply your datetime format.

dt_str = '31-JAN-07 12.00.00.000000000 AM'

fmt = '%d-%b-%y %H.%M.%S.%f %p'
pd.to_datetime(dt_str, format=fmt)

#output: Timestamp('2007-01-31 12:00:00')

Maybe try not parsing the date when reading csv because you have two date columns and they have different formats. Just read-in the raw string. and then convert string to datetime object in pandas.

df['EOM_DEFAULT_DATE'] = pd.to_datetime(df['EOM_DEFAULT_DATE'], format='%d-%b-%y %H.%M.%S.%f %p')
df['RESOLUTION_DATE'] = pd.to_datetime(df['RESOLUTION_DATE'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.