1

I currently have a dataframe as follows and all I want to do is just replace the strings in Maturity with just the number within them. For example, I want to replace FZCY0D with 0 and so on.

            Date   Maturity  Yield_pct Currency
0     2009-01-02     FZCY0D       4.25      AUS
1     2009-01-05     FZCY0D       4.25      AUS
2     2009-01-06     FZCY0D       4.25      AUS

My code is as follows and I tried replacing these strings with the numbers, but that lead to the error AttributeError: 'Series' object has no attribute 'split' in the line result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()]). I am hence struggling to understand how to do this.

from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
import xlrd

url = 'http://www.rba.gov.au/statistics/tables/xls/f17hist.xls'
xls = pd.ExcelFile(url)

#Gets rid of the information that I dont need in my dataframe
df = xls.parse('Yields', skiprows=10, index_col=None, na_values=['NA'])


df.rename(columns={'Series ID': 'Date'}, inplace=True)

# This line assumes you want datetime, ignore if you don't
#combined_data['Date'] = pd.to_datetime(combined_data['Date'])

result = pd.melt(df, id_vars=['Date'])

result['Currency'] = 'AUS'
result.rename(columns={'value': 'Yield_pct'}, inplace=True)
result.rename(columns={'variable': 'Maturity'}, inplace=True)

result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()])


print result
1
  • The split() method is for an individual string; it returns a list of strings broken by white space. Commented Jun 19, 2015 at 18:08

1 Answer 1

2

You can use the vectorised str methods and pass a regex to extract the number:

In [15]:

df['Maturity'] = df['Maturity'].str.extract('(\d+)')
df
Out[15]:
         Date Maturity  Yield_pct Currency
0  2009-01-02        0       4.25      AUS
1  2009-01-05        0       4.25      AUS
2  2009-01-06        0       4.25      AUS

You can call astype(int) to cast the series to int:

In [17]:
df['Maturity'] = df['Maturity'].str.extract('(\d+)').astype(int)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
Date         3 non-null object
Maturity     3 non-null int32
Yield_pct    3 non-null float64
Currency     3 non-null object
dtypes: float64(1), int32(1), object(2)
memory usage: 108.0+ bytes
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.