-1

I've seen that the below works for decimal numbers/string: (in this post)

pattern = re.compile(r"([0-9]+?)\1+")

but it doesn't seem to work at binary strings.

EDIT: Adding an example:

For string 1110111011 it should return the pattern 1110

So I would like to get the longest repeated pattern.
Any ideas how to accomplish that?

4
  • what are you expecting the output for the given binary number to be? Commented Nov 27, 2013 at 23:48
  • 1
    I don't see the recurring pattern in that string. Did you mean to add a 10 to the end of it? Commented Nov 27, 2013 at 23:49
  • There are limitations in Python's regex library related to variable length look-aheads. I do not believe you can use ([0-9]+?)\1. I'll see if I can locate the reference. Commented Nov 27, 2013 at 23:52
  • @JoranBeasley the recurring pattern on string 1110111011 is 1110. Commented Nov 28, 2013 at 7:22

1 Answer 1

1

Nobody knows what you mean by "doesn't seem to work". Spell out what you want it to do. This is what it does:

>>> import re
>>> pattern = re.compile(r"([0-9]+?)\1+")
>>> print pattern.search("1110111011").group()
111

And that's exactly what the regexp asked it to do: it found the shortest repeating pattern at the leftmost position where one occurs, which happens to be 3 repetitions of the initial digit 1. If you want the longest repeated pattern at the leftmost position at which one occurs, take ? out of the regexp:

>>> print re.search(r"([0-9]+)\1+", "1110111011").group()
11101110

If you want something else, you'll need to say what that is ;-)

Finding the longest overall

As noted, a regexp search finds the leftmost position at which the pattern matches, and that's all. So consider this:

pattern = re.compile(r"([01]+)\1+")
print(pattern.search("11010").group())

That does not find "1010". The leftmost position at which the pattern matches is at the first character, where "1" is repeated. So that prints "11".

Applying re.findall() can't help in this case, because the true longest match overlaps the "11" initially found. findall() only returns non-overlapping matches.

Overcoming this is pretty excruciating ;-) It's possible to do it with a single regexp, via abusing a "positive lookahead assertion":

pattern = re.compile(r"(?=(([01]+)\2+))")
for match in pattern.finditer("11010"):
    print(match.group(1), "at {}:{}".format(*match.span(1)))

That displays:

11 at 0:2
1010 at 1:5

But I don't know of a way to make a regexp find only the longest global match, and would be surprised if such a way exists.

Sign up to request clarification or add additional context in comments.

4 Comments

+1 for Nobody knows what you mean by "doesn't seem to work". Spell out what you want it to do.
Sorry for being vague guys, I have updated the post hope it is better now. What I basically want is a regex that will recognize recurring patterns in binary strings, i.e. it should return 1110 when string=1110111011
@timpeters This is exactly what I was looking for, the longest repeated pattern. Thank you very much
Hope it helps, but it's probably not what you want then: note the "at the leftmost position at which one occurs" qualification. Play with it ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.