How to get specific string in two subtring in python regex?

Question

Here is the example:

review: I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i ... reviewer: tom

i want extract the string between string review: and ...

So above situation's extraction is

I love you very much

I hate you very much

sky is pink and i

i use this kind of regex but fail

re.findall("review(.*)...",string)

it extract this kind of outcome:

I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i

safiqul islam · Accepted Answer · 2020-07-19 03:51:38Z

2

this will also work and it's simple

str = "review: I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i ... reviewer: tom"

matches = re.findall('review:(.+?)\.\.\.', str)

answered Jul 19, 2020 at 3:51

safiqul islam

6805 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Biegeleisen · Accepted Answer · 2020-07-19 03:15:27Z

1

Use re.findall with the pattern \breview:\s*(.*?)\.\.\.\s*(?=\breviewer:|$):

inp = "review: I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i ... reviewer: tom"
matches = re.findall(r'\breview:\s*(.*?)\.\.\.\s*(?=\breviewer:|$)', inp)
print(matches)

This prints:

['I love you very much', 'I hate you very much', 'sky is pink and i ']

answered Jul 19, 2020 at 3:15

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Comments

Austin · Accepted Answer · 2020-07-19 03:20:53Z

1

You can use following pattern which utilizes lookaheads:

(?<=review:\s).*?(?=\.\.\.)

inp = "review: I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i ... reviewer: tom"
matches = re.findall(r'(?<=review:\s).*?(?=\.\.\.)', inp)
print(matches)

answered Jul 19, 2020 at 3:20

Austin

26.1k4 gold badges28 silver badges52 bronze badges

Comments

Ryszard Czech · Accepted Answer · 2020-07-19 20:38:15Z

Use

re.findall(r'\breview:\s*(.*?)\s*\.\.\.', string)

See proof. Python test:

import re
regex = r"\breview:\s*(.*?)\s*\.\.\."
string = "review: I love you very much... reviewer:jackson review: I hate you very much... reviewer:madden review: sky is pink and i ... reviewer: tom"
print ( re.findall(regex, string) )

Output: ['I love you very much', 'I hate you very much', 'sky is pink and i']

Note the r"..." prefix signalling raw string literal since "\b" is not a word boundary, and r"\b" is.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  review:                  'review:'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount possible))
--------------------------------------------------------------------------------
  \.\.\.                   '...'
--------------------------------------------------------------------------------

Super-ilad · Accepted Answer · 2020-07-19 03:21:41Z

-1

Sorry i forget to add \ in front of .

and the right one is: re.findall("review:\b?(.*)\.\.\.",string)

and this time , it counts

answered Jul 19, 2020 at 3:21

Super-ilad

11111 bronze badges

3 Comments

Austin Over a year ago

This will not work. You want a non-greedy match. Please have a look at ? in the capture in my answer.

Tim Biegeleisen Over a year ago

Also, please look at my answer :-)

Super-ilad Over a year ago

Good, great help!

Collectives™ on Stack Overflow

How to get specific string in two subtring in python regex?

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related