Pandas substring

Question

I have the following dataframe:

     contract
 0   WTX1518X22
 1   WTX1518X20.5
 2   WTX1518X19
 3   WTX1518X15.5

I need to add a new column containing everything following the last 'X' from the first column. So the result would be:

     contract        result
 0   WTX1518X22      22
 1   WTX1518X20.5    20.5
 2   WTX1518X19      19
 3   WTX1518X15.5    15.5

So I figure I first need to find the string index position of the last 'X' (because there may be more than one 'X' in the string). Then get a substring containing everything following that index position for each row.

EDIT:

I have managed to get the index position of 'X' as required:

df.['index_pos'] = df['contract'].str.rfind('X', start=0, end=None)

But I still can't seem to get a new column containing all characters following the 'X'. I am trying:

df['index_pos'] = df['index_pos'].convert_objects(convert_numeric=True)
df['result'] = df['contract'].str[df['index_pos']:]

But this just gives me an empty column called 'result'. This is strange because if I do the following then it works correctly:

df['result'] = df['contract'].str[8:]

So I just need a way to not hardcode '8' but to instead use the column 'index_pos'. Any suggestions?

I don't want to sound like a perl fanboy, but have you tried regex? Something simple like df. contract.str.extract(".*X(.*)") probably already works. — cel
– cel, Commented Nov 9, 2015 at 8:47

EdChum · Accepted Answer · 2015-11-09 09:32:57Z

4

Use vectorised str.split to split the string and cast the last split to float:

In [10]:
df['result'] = df['contract'].str.split('X').str[-1].astype(float)
df

Out[10]:
       contract  result
0    WTX1518X22    22.0
1  WTX1518X20.5    20.5
2    WTX1518X19    19.0
3  WTX1518X15.5    15.5

answered Nov 9, 2015 at 9:32

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

luismf · Accepted Answer · 2015-11-09 09:43:18Z

0

import pandas as pd
import re as re
df['result'] = df['contract'].map(lambda x:float(re.findall('([0-9\.]+)$',x)[0]))

Out[34]: 
       contract  result
0    WTX1518X22    22.0
1  WTX1518X20.5    20.5
2    WTX1518X19    19.0
3  WTX1518X15.5    15.5

A similar approach to the one by EdChump using regular expressions, this one only assumes that the number is at the end of the string.

answered Nov 9, 2015 at 9:43

luismf

3711 silver badge7 bronze badges

Collectives™ on Stack Overflow

Pandas substring

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related