130

I have a column in a pandas DataFrame that I would like to split on a single space. The splitting is simple enough with DataFrame.str.split(' '), but I can't make a new column from the last entry. When I .str.split() the column I get a list of arrays and I don't know how to manipulate this to get a new column for my DataFrame.

Here is an example. Each entry in the column contains 'symbol data price' and I would like to split off the price (and eventually remove the "p"... or "c" in half the cases).

import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp2 = temp.ticker.str.split(' ')

which yields

0    ['spx', '5/25/2001', 'p500']
1    ['spx', '5/25/2001', 'p600']
2    ['spx', '5/25/2001', 'p700']

But temp2[0] just gives one list entry's array and temp2[:][-1] fails. How can I convert the last entry in each array to a new column? Thanks!

6 Answers 6

242

Do this:

In [43]: temp2.str[-1]
Out[43]: 
0    p500
1    p600
2    p700
Name: ticker

So all together it would be:

>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
>>> temp['ticker'].str.split(' ').str[-1]
0    p500
1    p600
2    p700
Name: ticker, dtype: object
Sign up to request clarification or add additional context in comments.

7 Comments

Love the clean solution!
from the author of "Pandas" :)
I love this solution, but how does it work? Meaning, what is happening "behind the scenes" that allows str followed by brackets to select a specific element from the list?
I got slightly confused by this, the one-liner is d1.ticker.str.split().str[-1]. Not what you'd expect...
@KevinMarkham: Here's how it works: str works not only for strings but also for lists to some extent. So if you had a string Series foo then foo.str[0] would take the first character of each string, and foo.str[-1] would take the last. But since str also works (partially) on lists too, temp2.str[-1] takes the last element of each list in the Series. A string, after all, is a sequence of characters, similar to a list.
|
47

You could use the tolist method as an intermediary:

In [99]: import pandas as pd

In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})

In [101]: d1.ticker.str.split().tolist()
Out[101]: 
[['spx', '5/25/2001', 'p500'],
 ['spx', '5/25/2001', 'p600'],
 ['spx', '5/25/2001', 'p700']]

From which you could make a new DataFrame:

In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), 
   .....:                   columns="symbol date price".split())

In [103]: d2
Out[103]: 
  symbol       date price
0    spx  5/25/2001  p500
1    spx  5/25/2001  p600
2    spx  5/25/2001  p700

For good measure, you could fix the price:

In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)

In [105]: d2
Out[105]: 
  symbol       date  price
0    spx  5/25/2001    500
1    spx  5/25/2001    600
2    spx  5/25/2001    700

PS: but if you really just want the last column, apply would suffice:

In [113]: temp2.apply(lambda x: x[2])
Out[113]: 
0    p500
1    p600
2    p700
Name: ticker

4 Comments

This just helped me add a log file in pandas which was too scary and messy to even touch before (single column of data with a lot of information per row).
All of these approaches have disastrous performance compared with Wes McKinney's answer.
@JohnZwinck: wow, a performance-only related downvote on a five-year old answer about functionality which had only been introduced about two months before? That's.. rigorous, I'll give you that!
But that's the point of SE: Outdated answers should be shown less prominently. As that's not possible here if OP doesn't change accepted solution, only warning to future users can be difference in votes..
29

https://pandas.pydata.org/pandas-docs/stable/text.html

s2 = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])
s2.str.split('_').str.get(1)

or

s2.str.split('_').str[1]

1 Comment

you can use -1 to get last element like accessing last element from the list s2.str.split('_').str.get(-1)
6

Using Pandas 0.20.3:

In [10]: import pandas as pd
    ...: temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
    ...:

In [11]: temp2 = temp.ticker.str.split(' ', expand=True)  # the expand=True return a DataFrame

In [12]: temp2
Out[12]:
     0          1     2
0  spx  5/25/2001  p500
1  spx  5/25/2001  p600
2  spx  5/25/2001  p700

In [13]: temp3 = temp.join(temp2[2])

In [14]: temp3
Out[14]:
               ticker     2
0  spx 5/25/2001 p500  p500
1  spx 5/25/2001 p600  p600
2  spx 5/25/2001 p700  p700

Comments

5

If you are looking for a one-liner (like I came here for), this should do nicely:

temp2 = temp.ticker.str.split(' ', expand = True)[-1]

You can also trivially modify this answer to assign this column back to the original DataFrame as follows:

temp['last_split'] = temp.ticker.str.split(' ', expand = True)[-1]

Which I imagine is a popular use case here.

Comments

1
import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp["last"] = temp.ticker.str.split(" ").apply(lambda x: x[-1])

#                ticker  last
# 0  spx 5/25/2001 p500  p500
# 1  spx 5/25/2001 p600  p600
# 2  spx 5/25/2001 p700  p700

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.