escape continuously strings in regex matching in python

Question

I met one problem when I use regex to match some string using Python.

Example string:

ln[1] --This is a string--

ln[2] Match the line below.

ln[3] --This is a string--

ln[4] Match this line start from here.

ln[5] -This is the end-

I want to extract abc in the string above.

code:

pattern = re.compile('%s(.*?)%s' % ('--This is a string--', '-This is the end-'))
re.findall(pattern, string)

How can I get the line 4 only, not get line 2 to line 4 ?

Thank you very much.

Regex engines work left-to-right, so your regex starts the match at the first a it encounters, and then keeps matching until the c is reached. If you don't want to allow more than one a, you need to tell the regex engine that. — Tim Pietzcker
– Tim Pietzcker, Commented Jul 10, 2013 at 10:13

falsetru · Accepted Answer · 2013-07-10 10:09:58Z

2

>>> re.findall('a[^a]*c', 'aaaaaaaaabc')
['abc']
>>> re.findall('a[^a]*c', 'aaaaaaaaa c')
['a c']

answered Jul 10, 2013 at 10:09

falsetru

371k69 gold badges770 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mishik · Accepted Answer · 2013-07-10 10:13:02Z

2

Probably, via this:

pattern = re.compile('.*(a.*?c)')
re.findall(pattern, string)  # yields ["abc"]

answered Jul 10, 2013 at 10:07

mishik

10k9 gold badges48 silver badges69 bronze badges

mata · Accepted Answer · 2013-07-10 10:20:00Z

1

If you want to replace all instances of repeated characters you could use id or named groups.

Example:

with id:

>>> re.sub('(.)(\\1)+', '\\1', 'abcAAAAabcBBBBabcCCCCabc')
'abcAabcBabcCabc'

with name:

>>> re.sub('(?P<n>.)(?P=n)+', '\\1', 'abcAAAAabcBBBBabcCCCCabc')
'abcAabcBabcCabc'

answered Jul 10, 2013 at 10:20

mata

69.4k10 gold badges168 silver badges162 bronze badges