I am trying to calculate a new column based on conditions of three other columns using string methods.
Sample data:
d = pd.DataFrame({'street1': ['1000 foo dr', '1001 bar dr', '1002 foo dr suite101', '1003 bar dr'],
'street2': ['city_a', np.nan, 'suite 101', 'suite 102'],
'city': ['city_a', 'city_b', np.nan, 'city_c']})
street1 street2 city
1000 foo dr city_a city_a
1001 bar dr NaN city_b
1002 foo dr suite101 suite 101 NaN
1003 bar dr suite 102 city_c
ideal output:
Address
1000 foo dr
1001 bar dr
1002 foo dr suite 101
1003 bar dr suite 102
The idea here is
- if
street2matchescity, ignore - if
street2matches the end ofstreet1, ignore - otherwise, concatenate
street1andstreet2
What I tried:
def address_clean(row):
if not row['street2']:
return row['street1']
if row['street2'] == row['city']:
return row['street1']
elif row['street1'].str.replace(' ', '').find(row['street2'].str.replace(' ', '')) != -1:
return row['street1']
else:
return row['street1'] + row['street2']
d.apply(lambda row: address_clean(row), axis=1).head()
This one throws me an error:
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 1')
It seems like the row[street1] is a string instead of a pd.Series. However even if I remove the .str part from the original function, which became:
def address_clean(row):
if not row['street2']:
return row['street1']
if row['street2'] == row['city']:
return row['street1']
elif row['street1'].replace(' ', '').find(row['street2'].replace(' ', '')) != -1:
return row['street1']
else:
return row['street1'] + row['street2']
d.apply(lambda row: address_clean(row), axis=1).head()
The code throws me the following error:
AttributeError: ("'float' object has no attribute 'replace'", 'occurred at index 1')
I am wondering which part of the function was I using incorrectly, and how to solve this error.
type(np.nan)givesfloatsuite 123come from? And why was row 3 concatenated with row 4?if not row['street2']: return row['street1']part of the function handle that properly? Why would it be evaluated in the following if statements?