1

I am new to regex, I want to extract specific words within a python string. This is the string:

'1. feature name: occupation_Transport-moving<br>coefficient: 0.1776<br>2. feature name: education<br>coefficient: 0.0726<br>3. feature name: occupation_Machine-op-inspct<br>coefficient: 0.0661<br>4. feature name: occupation_Armed-Forces<br>coefficient: 0.0006<br>5. feature name: workclass_Without-pay<br>coefficient: -0.0194<br>6. feature name: occupation_Handlers-cleaners<br>coefficient: -0.1256<br>7. feature name: occupation_Farming-fishing<br>coefficient: -0.3938<br>8. feature name: GDP Group<br>coefficient: -0.4138<br>9. feature name: occupation_Other-service<br>coefficient: -0.4294<br>10. feature name: occupation_Priv-house-serv<br>coefficient: -0.6560<br>'

The result I am looking for:

[occupation_Transport-moving,education,occupation_Machine-op-inspct,occupation_Armed-Forces,workclass_Without-pay,occupation_Handlers-cleaners,occupation_Farming-fishing,GDP Group,occupation_Other-service,occupation_Priv-house-serv]

I have tried this but it does return the whole string starting from:: re.findall(':\s(.*)<',txt)

Thank you in advance for your assistance.

1

1 Answer 1

1

Use

:\s*([^:.<]+)<

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^:.<]+                  any character except: ':', '.', '<' (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  <                        '<'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.