Get last "column" after .str.split() operation on column in pandas DataFrame

Question

I have a column in a pandas DataFrame that I would like to split on a single space. The splitting is simple enough with DataFrame.str.split(' '), but I can't make a new column from the last entry. When I .str.split() the column I get a list of arrays and I don't know how to manipulate this to get a new column for my DataFrame.

Here is an example. Each entry in the column contains 'symbol data price' and I would like to split off the price (and eventually remove the "p"... or "c" in half the cases).

import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp2 = temp.ticker.str.split(' ')

which yields

0    ['spx', '5/25/2001', 'p500']
1    ['spx', '5/25/2001', 'p600']
2    ['spx', '5/25/2001', 'p700']

But temp2[0] just gives one list entry's array and temp2[:][-1] fails. How can I convert the last entry in each array to a new column? Thanks!

Erfan · Accepted Answer · 2020-01-13 12:05:02Z

242

Do this:

In [43]: temp2.str[-1]
Out[43]: 
0    p500
1    p600
2    p700
Name: ticker

So all together it would be:

>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
>>> temp['ticker'].str.split(' ').str[-1]
0    p500
1    p600
2    p700
Name: ticker, dtype: object

edited Jan 13, 2020 at 12:05

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

answered Oct 24, 2012 at 16:13

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ericmjl Over a year ago

Love the clean solution!

kmonsoor Over a year ago

from the author of "Pandas" :)

Kevin Markham Over a year ago

I love this solution, but how does it work? Meaning, what is happening "behind the scenes" that allows str followed by brackets to select a specific element from the list?

citynorman Over a year ago

I got slightly confused by this, the one-liner is d1.ticker.str.split().str[-1]. Not what you'd expect...

John Zwinck Over a year ago

@KevinMarkham: Here's how it works: str works not only for strings but also for lists to some extent. So if you had a string Series foo then foo.str[0] would take the first character of each string, and foo.str[-1] would take the last. But since str also works (partially) on lists too, temp2.str[-1] takes the last element of each list in the Series. A string, after all, is a sequence of characters, similar to a list.

|

DSM · Accepted Answer · 2012-09-20 01:43:43Z

47

You could use the tolist method as an intermediary:

In [99]: import pandas as pd

In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})

In [101]: d1.ticker.str.split().tolist()
Out[101]: 
[['spx', '5/25/2001', 'p500'],
 ['spx', '5/25/2001', 'p600'],
 ['spx', '5/25/2001', 'p700']]

From which you could make a new DataFrame:

In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), 
   .....:                   columns="symbol date price".split())

In [103]: d2
Out[103]: 
  symbol       date price
0    spx  5/25/2001  p500
1    spx  5/25/2001  p600
2    spx  5/25/2001  p700

For good measure, you could fix the price:

In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)

In [105]: d2
Out[105]: 
  symbol       date  price
0    spx  5/25/2001    500
1    spx  5/25/2001    600
2    spx  5/25/2001    700

PS: but if you really just want the last column, apply would suffice:

In [113]: temp2.apply(lambda x: x[2])
Out[113]: 
0    p500
1    p600
2    p700
Name: ticker

answered Sep 20, 2012 at 1:43

DSM

355k67 gold badges606 silver badges504 bronze badges

4 Comments

trench Over a year ago

This just helped me add a log file in pandas which was too scary and messy to even touch before (single column of data with a lot of information per row).

John Zwinck Over a year ago

All of these approaches have disastrous performance compared with Wes McKinney's answer.

DSM Over a year ago

@JohnZwinck: wow, a performance-only related downvote on a five-year old answer about functionality which had only been introduced about two months before? That's.. rigorous, I'll give you that!

FooBar Over a year ago

But that's the point of SE: Outdated answers should be shown less prominently. As that's not possible here if OP doesn't change accepted solution, only warning to future users can be difference in votes..

James Holland · Accepted Answer · 2017-07-07 17:52:33Z

29

https://pandas.pydata.org/pandas-docs/stable/text.html

s2 = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])
s2.str.split('_').str.get(1)

or

s2.str.split('_').str[1]

answered Jul 7, 2017 at 17:52

James Holland

1,16411 silver badges19 bronze badges

1 Comment

chanduthedev Over a year ago

you can use -1 to get last element like accessing last element from the list s2.str.split('_').str.get(-1)

AllanLRH · Accepted Answer · 2017-11-13 17:12:15Z

6

Using Pandas 0.20.3:

In [10]: import pandas as pd
    ...: temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
    ...:

In [11]: temp2 = temp.ticker.str.split(' ', expand=True)  # the expand=True return a DataFrame

In [12]: temp2
Out[12]:
     0          1     2
0  spx  5/25/2001  p500
1  spx  5/25/2001  p600
2  spx  5/25/2001  p700

In [13]: temp3 = temp.join(temp2[2])

In [14]: temp3
Out[14]:
               ticker     2
0  spx 5/25/2001 p500  p500
1  spx 5/25/2001 p600  p600
2  spx 5/25/2001 p700  p700

answered Nov 13, 2017 at 17:12

AllanLRH

1,2142 gold badges14 silver badges23 bronze badges

Comments

sfortney · Accepted Answer · 2019-09-25 14:22:22Z

5

If you are looking for a one-liner (like I came here for), this should do nicely:

temp2 = temp.ticker.str.split(' ', expand = True)[-1]

You can also trivially modify this answer to assign this column back to the original DataFrame as follows:

temp['last_split'] = temp.ticker.str.split(' ', expand = True)[-1]

Which I imagine is a popular use case here.

answered Sep 25, 2019 at 14:22

sfortney

2,1236 gold badges27 silver badges44 bronze badges

Comments

Bera · Accepted Answer · 2023-11-14 08:01:11Z

1

import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp["last"] = temp.ticker.str.split(" ").apply(lambda x: x[-1])

#                ticker  last
# 0  spx 5/25/2001 p500  p500
# 1  spx 5/25/2001 p600  p600
# 2  spx 5/25/2001 p700  p700

answered Nov 14, 2023 at 8:01

Bera

2,2203 gold badges22 silver badges41 bronze badges

Collectives™ on Stack Overflow

Get last "column" after .str.split() operation on column in pandas DataFrame

6 Answers 6

7 Comments

4 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

7 Comments

4 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related