1

I've been working a lot with pandas in python to extract information. I have the following titles in one column of my dataframe:

   0
In & Out (1997)
Simple Plan, A (1998)
Retro Puppetmaster (1999)
Paralyzing Fear: The Story of Polio in America, A (1998)
Old Man and the Sea, The (1958)
Body Shots (1999)
Coogan's Bluff (1968)
Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)
Search for One-eye Jimmy, The (1996)
Funhouse, The (1981)

I'd like to take the years of those titles and put into a new column. The issue I'm running into is if I do the split on '(' as the delimiter, as you see on row 8, it's split there. So how do I split at the (yyyy) to form a new column with that year to look like this?

     0                 1
In & Out              1997
Simple Plan, A        1998
Retro Puppetmaster    1999 
Paralyzing Fear:...   1998
Old Man and the S...  1958
Body Shots            1999
Coogan's Bluff        1968 
Seven Samurai (T...   1954
Search for One-ey...  1996
Funhouse, The         1981
2
  • 1
    stackoverflow.com/a/35376466/42346 Commented Jun 9, 2017 at 18:04
  • [''.join(c for c in x if all(c in '0123456789' and len(x) == 4)) for x in row.split() for row in df[1]] Commented Jun 9, 2017 at 18:22

2 Answers 2

1

You can use expand:

df['year'] = df.iloc[:,0].str.extract('\((\d{4})\)'',expand=False)

df
Out[381]: 
                                                   0  year
0                                    In & Out (1997)  1997
1                              Simple Plan, A (1998)  1998
2                          Retro Puppetmaster (1999)  1999
3  Paralyzing Fear: The Story of Polio in America...  1998
4                    Old Man and the Sea, The (1958)  1958
5                                  Body Shots (1999)  1999
6                              Coogan's Bluff (1968)  1968
7  Seven Samurai (The Magnificent Seven) (Shichin...  1954
8               Search for One-eye Jimmy, The (1996)  1996
9                               Funhouse, The (1981)  1981
Sign up to request clarification or add additional context in comments.

Comments

0

You can try string slicing operation. rindex() method of string data type returns the index value of the matched pattern (in this case it is '(') starting from right end corner. With the index value we can perform string slicing as expected.

For example :

>>> a = "Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)"
>>>
>>> print a[:a.rindex('(')], a[a.rindex('(')+1:-1]

Seven Samurai (The Magnificent Seven) (Shichinin no samurai)  1954    
>>>
>>>

1 Comment

This doesn't answer the question fully. It would be better served as a comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.