Get last string element in DataFrame column conditional

Question

Suppose I have the following simple dataframe:

df_data=pd.DataFrame({'name':['ABC','ABC XYZ']})

To get the last element I apply:

df_end= pd.DataFrame(df_data.name.str.split().str.get(-1), columns=['name'])

The result is ABC. I'd like to get None when the length of name is less than 2. I have tried the following, but I am not getting right:

df_end['name'] = df_data.name.str.split().apply(lambda x: x[-1] if len(x)>1)

I should not get ABC as the last element for ABC, but I should get XYZ in ABC XYZ

Quang Hoang · Accepted Answer · 2019-12-17 20:14:03Z

3

I think you can try:

df_data['name'].str.extract('\s(\S+)$')

Output:

     0
0  NaN
1  XYZ

answered Dec 17, 2019 at 20:14

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JarochoEngineer Over a year ago

why df_end= pd.DataFrame(df_data.name.str.extract('\s(\S+)$'),columns=['name']) does not give the last string with the same data? Getting NaN for the two rows.

Quang Hoang Over a year ago

Because the series df_data.name.str.extract('\s(\S+)$') has 0 as names. You can do: pd.DataFrame(df_data['name'].str.extract('\s(\S+)$').values,columns=['name'])

ALollz · Accepted Answer · 2019-12-17 20:25:54Z

1

If you expect many splits, it can be faster to rpartition as you want only the last. Then mask any single word strings.

u = df_data.name.str.rpartition()
u[2].where(u[0].ne(''))

#0    NaN
#1    XYZ
#Name: 2, dtype: object

answered Dec 17, 2019 at 20:25

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Collectives™ on Stack Overflow

Get last string element in DataFrame column conditional

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related