pandas get "single" keywords from DataFrame string search

this is a follow-up of this previous topic. I have a Series of strings w called "British" like this:

British
\bSkilful\b
\bWilful\b
\bfulfil\b
\b.*favour.*\b
\bappal\b
\bappall.*\b
\barbour.*\b
\barmor.*\b
\bstrange\b
\brumor.*\b
\b.*color.*\b
\b.*centre's\b

and a DataFrame df like this:

 User_ID     Tweet
 01          hi all
 02          see you something
 03          that's my favourite spot
 04          the strangest rumors
 05          my appal is nice
 06          check my rumor
 07          #brborboncheckruMoreThanever
 08          look @mycentre's

I would like to get a new column containing the SINGLE keywords found in the strings. So far I did:

 List = pd.read_csv('w.txt')
 r = re.compile(r'.*({}).*'.format('|'.join(List['British'].values)), re.IGNORECASE)

and then mask the DataFrame:

  masked = map(bool, map(r.search, df['Tweet']))
  df2 = df[masked]

Then I masked it again to add the 'keyword' column:

 mask = [m.group(1) if m else None for m in map(r.search, df2['Tweet'])]
 df2['keyword'] = mask

which returns:

   User_ID                     Tweet         keyword
2        3  that's my favourite spot  favourite spot
4        5          my appal is nice           appal
5        6            check my rumor           rumor
7        8          look @mycentre's      mycentre's

So the boolean mask works fine and detect only the tweets containing at least one keyword. But what if I would like to extract only the single keyword found? The final DataFrame should be as:

   User_ID                     Tweet         keyword
2        3  that's my favourite spot       favourite
4        5          my appal is nice           appal
5        6            check my rumor           rumor
7        8          look @mycentre's        centre's

Thanks so much for your kind help.

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

asked Jan 30, 2015 at 12:08

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

In the instance of index 2 in your returned dataframe, are you looking to have a different keyword column if there are multiple keywords? So, keyword1 would have "favourite" and keyword2 would have 'spot', for example?

boot-scootin
– boot-scootin

2016-11-08 14:47:35 +00:00
Commented Nov 8, 2016 at 14:47
It is impossible, you cannot differentiate between centre's and mycentre's, both are chunks of non-whitespace chars. The logic that you describe is [m.group(1).split()[0] if m else None for m in map(r.search, df2['Tweet'])]

Wiktor Stribiżew
– Wiktor Stribiżew

2019-03-25 18:19:43 +00:00
Commented Mar 25, 2019 at 18:19

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

pandas get "single" keywords from DataFrame string search

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked