even though you might end up with some very complex regex to parse the Behat language, this is a typical case of 'I had one problem, I used a regex, now I have 2 problems':

Instead of losing your mind trying to solve this with a regex, you should better use a library that can read and parse the Behat language.
The reason is that the regex language is great to work on simple string parsing problem (working with the tokens of a language). Even though it can do it (with extended regex), parsing a complex language is more abstract. You need to not only look at the tokens (the words), but at the grammar (the syntax and its meaning).
A typical issue (which you're facing) is when a word has a different meaning given the context, and a grammar is there to help on this. And even though you can figure out the first step of parsing the scenarios, when you'll look at each scenario, you're likely to have a similar issue.
So that's why you need to implement a full blown parser… But writing a parser is not easy (the most complex part being writing the grammar).
So if you're lucy, someone else has done it for you!
And you're lucky! Looking at some documentation on Behat the language used is call gherkin. With some googling, I found at least one python package that understands that language : cucumber/gherkin-python, which has now moved to the cucumber/cucumber repository.
The snippet to use the parser is the following:
from gherkin.parser import Parser
from gherkin.pickles.compiler import compile
parser = Parser()
gherkin_document = parser.parse("Feature: ...")
pickles = compile(gherkin_document)
Then you'll get a structured data output which you'll be able to navigate through easily in python.