Replacing Strings in Column of Dataframe with the number in the string

Question

I currently have a dataframe as follows and all I want to do is just replace the strings in Maturity with just the number within them. For example, I want to replace FZCY0D with 0 and so on.

            Date   Maturity  Yield_pct Currency
0     2009-01-02     FZCY0D       4.25      AUS
1     2009-01-05     FZCY0D       4.25      AUS
2     2009-01-06     FZCY0D       4.25      AUS

My code is as follows and I tried replacing these strings with the numbers, but that lead to the error AttributeError: 'Series' object has no attribute 'split' in the line result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()]). I am hence struggling to understand how to do this.

from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
import xlrd

url = 'http://www.rba.gov.au/statistics/tables/xls/f17hist.xls'
xls = pd.ExcelFile(url)

#Gets rid of the information that I dont need in my dataframe
df = xls.parse('Yields', skiprows=10, index_col=None, na_values=['NA'])


df.rename(columns={'Series ID': 'Date'}, inplace=True)

# This line assumes you want datetime, ignore if you don't
#combined_data['Date'] = pd.to_datetime(combined_data['Date'])

result = pd.melt(df, id_vars=['Date'])

result['Currency'] = 'AUS'
result.rename(columns={'value': 'Yield_pct'}, inplace=True)
result.rename(columns={'variable': 'Maturity'}, inplace=True)

result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()])


print result

The split() method is for an individual string; it returns a list of strings broken by white space. — chrisaycock
– chrisaycock, Commented Jun 19, 2015 at 18:08

EdChum · Accepted Answer · 2015-06-19 18:06:00Z

2

You can use the vectorised str methods and pass a regex to extract the number:

In [15]:

df['Maturity'] = df['Maturity'].str.extract('(\d+)')
df
Out[15]:
         Date Maturity  Yield_pct Currency
0  2009-01-02        0       4.25      AUS
1  2009-01-05        0       4.25      AUS
2  2009-01-06        0       4.25      AUS

You can call astype(int) to cast the series to int:

In [17]:
df['Maturity'] = df['Maturity'].str.extract('(\d+)').astype(int)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
Date         3 non-null object
Maturity     3 non-null int32
Yield_pct    3 non-null float64
Currency     3 non-null object
dtypes: float64(1), int32(1), object(2)
memory usage: 108.0+ bytes

answered Jun 19, 2015 at 18:06

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replacing Strings in Column of Dataframe with the number in the string

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related