0

I have a dataframe which has lines as below in a single column:

__label__JCB_Spare_Part __label__Differential_Housings jcb casting assy differential housing __label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp __label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp __label__Crawler_Dozer __label__Bulldozer dozer bulldozer __label__Crawler_Dozer __label__Bulldozer dozer bulldozer

I wish to extract all the words with prefix equal to __label__ in a separate column as below: __label__JCB_Spare_Part __label__Differential_Housings __label__Vibrating_Roller __label__Road_Roller __label__Vibrating_Roller __label__Road_Roller __label__Crawler_Dozer __label__Bulldozer __label__Crawler_Dozer __label__Bulldozer

What I have tried: labels = input[0].str.extract(r'(__label__[\w]+)') but it only pulls out a single first label.

2 Answers 2

1

Your code is mostly correct; it's just that you want findall instead:

labels = input[0].str.findall(r'(__label__[\w]+)')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @gmds, didn't know about findall() in str objects. btw \w contains underscore, right??
@hacker315 my mistake; you are right. \w does contain underscores, so it's just about findall vs extract. I have edited that part of the answer out.
0

You can try this:

import re

str = """
__label__JCB_Spare_Part  __label__Differential_Housings jcb  casting  assy  differential  housing
__label__Vibrating_Roller  __label__Road_Roller double  drum  mini  roller  seat  drive  model  fyl  engine  nbsp  hp  aircolled  diesel  engine  wheel  size  walk  speed  km  climbing  capacity  drive  hydrostatic  drive  nbsp  nbsp
__label__Vibrating_Roller  __label__Road_Roller double  drum  mini  roller  seat  drive  model  fyl  engine  nbsp  hp  aircolled  diesel  engine  wheel  size  walk  speed  km  climbing  capacity  drive  hydrostatic  drive  nbsp  nbsp
__label__Crawler_Dozer  __label__Bulldozer dozer  bulldozer
__label__Crawler_Dozer  __label__Bulldozer dozer  bulldozer
"""

result = re.findall('__label__\w+', str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.