0

I have a string:

a="12cdanfaw3i8hanjwofaef56ghah398hafadsf12cds;dkh38hfasdf56ghaldkshf12cdasdiuhf93f2asdf56gh"

I'm trying to extract a string between 12cd and 56gh. Those values would be anfaw3i8hanjwofaef, s;dkh38hfasdf, asdiuhf93f2asdf

The regex that I have is re.findall(r'12cd.*56gh', a).

But the patterns are included in the output.

How do I write the regex to not include it in the output?

Thanks

1
  • How does Python deal with regex captures? r'12cd(.*)56gh' Commented Mar 23, 2018 at 20:53

1 Answer 1

4

You need a non-greedy regex to get all 3 matches, and you also need to use a matching group to not include the pattern, so use 12cd(.*?)56gh

import re
print(re.findall(r'12cd(.*?)56gh', '12cdanfaw3i8hanjwofaef56ghah398hafadsf12cds;dkh38hfasdf56ghaldkshf12cdasdiuhf93f2asdf56gh'))

Output:

['anfaw3i8hanjwofaef', 's;dkh38hfasdf', 'asdiuhf93f2asdf']

Explanation

12cd              // matches 12cd
    (             // matching group 1
      .*?         // matches any character between 0 and unlimited times, lazy
    )             
56gh              // matches 56gh
Sign up to request clarification or add additional context in comments.

2 Comments

What does ? do?
*? will match between 0 and unlimited times, but match as few times as possible, expanding as necessary.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.