2

Suppose I have the following simple dataframe:

df_data=pd.DataFrame({'name':['ABC','ABC XYZ']})

To get the last element I apply:

df_end= pd.DataFrame(df_data.name.str.split().str.get(-1), columns=['name'])

The result is ABC. I'd like to get None when the length of name is less than 2. I have tried the following, but I am not getting right:

df_end['name'] = df_data.name.str.split().apply(lambda x: x[-1] if len(x)>1)

I should not get ABC as the last element for ABC, but I should get XYZ in ABC XYZ

2 Answers 2

3

I think you can try:

df_data['name'].str.extract('\s(\S+)$')

Output:

     0
0  NaN
1  XYZ
Sign up to request clarification or add additional context in comments.

2 Comments

why df_end= pd.DataFrame(df_data.name.str.extract('\s(\S+)$'),columns=['name']) does not give the last string with the same data? Getting NaN for the two rows.
Because the series df_data.name.str.extract('\s(\S+)$') has 0 as names. You can do: pd.DataFrame(df_data['name'].str.extract('\s(\S+)$').values,columns=['name'])
1

If you expect many splits, it can be faster to rpartition as you want only the last. Then mask any single word strings.

u = df_data.name.str.rpartition()
u[2].where(u[0].ne(''))

#0    NaN
#1    XYZ
#Name: 2, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.