1

for example is the string is "abbacdeffel" and the pattern being "xyyx" replaced with "1234"

so it would result from "abbacdeffel" to "1234cd1234l"

I have tried to think this out but I couldnt come up with anything. At first I thought maybe dictionary could help but still nothing came to mind.

7
  • 5
    Welcome to SO! Please take the tour, and read both How to Ask and minimal reproducible example Commented Mar 2, 2018 at 1:29
  • You should have a look at regex Commented Mar 2, 2018 at 1:32
  • For clarity, you are looking to replace 4-character lowercase strings consisting of a pair [xy] followed by a transposed pair [yx] for any [a-z]? If so, you will need to find the inner pair and outer pair substrings in each moving 4-tuple. Consider taking the token as a stream: is x succeeded by yyx? If so, apply the re.compile pattern to x and yyx, else, move on to the next character in the stream. Commented Mar 2, 2018 at 1:57
  • Would 'adda' fit your pattern ('xyyx')?, does case matter? Commented Mar 2, 2018 at 2:08
  • 2
    Please don't make more work for people by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under the CC BY-SA 3.0 license, for SE to distribute that content (i.e. regardless of your future choices). By SE policy, the non-vandalized version of the post is the one which is distributed. Thus, any vandalism will be reverted. Commented Mar 2, 2018 at 4:05

3 Answers 3

3

What you're looking to do can be accomplished by using regex, or more commonly known as, Regular Expressions. Regular Expressions in programming enables you to extract what you want and just what you want from a string. In your case, you want to match the string with the pattern abba so using the following regex:

(\w+)(\w+)\2\1

https://regex101.com/r/hP8lA3/1

You can match two word groups and use backreferences to make sure that the second group comes first, then the first group.

So implementing this in python code looks like this:

First, import the regex module in python

import re

Then, declare your variable

text = "abbacdeffel"

The re.finditer returns an iterable so you can iterate through all the groups

matches = re.finditer(r"(\w)(\w)\2\1", text)

Go through all the matches that the regexp found and replace the pattern with "1234"

for match in matches:
  text = text.replace(match.group(0), "1234")

For debugging:

print(text)

Complete Code:

import re

text = "abbacdeffel"

matches = re.finditer(r"(\w)(\w)\2\1", text)

for match in matches:
    text = text.replace(match.group(0), "1234")

print(text)

You can learn more about Regular Expressions here: https://regexone.com/references/python

Sign up to request clarification or add additional context in comments.

4 Comments

What if the pattern is user inputted. Like look for this type of pattern and replace it with 1234
Well you asked for the abba type pattern so I'm not sure what you're working on, but the fact of the matter is that you're going to have to do research on your own and even though stackoverflow is a great resource to ask questions about specific problems you're having, we can't design and conform our code to your needs. Please take a look the link that I have provided in the post and you can come up with a solution for that.
Hi tim, have a look to my answer down here ;)
in code :matches = re.finditer(r"(\w)(\w)\2\1", st) .I think "st" should be "text"
1

New version of code (there was a bug):

def replace_with_pattern(pattern, line, replace):
    from collections import OrderedDict
    set_of_chars_in_pattern = set(pattern)

    indice_start_pattern = 0
    output_line = ""
    while indice_start_pattern < len(line):
        potential_end_pattern = indice_start_pattern + len(pattern)
        subline               = line[indice_start_pattern:potential_end_pattern] 
        print(subline)
        set_of_chars_in_subline = set(subline)
        if len(set_of_chars_in_subline)!= len(set_of_chars_in_pattern):
            output_line += line[indice_start_pattern]
            indice_start_pattern +=1
            continue

        map_of_chars = OrderedDict()
        liste_of_chars_in_pattern = []
        for char in pattern:
            if char not in liste_of_chars_in_pattern:
                liste_of_chars_in_pattern.append(char)
        print(liste_of_chars_in_pattern)

        for subline_char in subline:
            if subline_char not in map_of_chars.values():
                map_of_chars[liste_of_chars_in_pattern.pop(0)] =subline_char 

        print(map_of_chars)
        wanted_subline = ""
        for char_of_pattern in pattern:
            wanted_subline += map_of_chars[char_of_pattern]
        print("wanted_subline =" + wanted_subline)
        if subline == wanted_subline:
            output_line += replace
            indice_start_pattern += len(pattern)
        else:
            output_line += line[indice_start_pattern]
            indice_start_pattern += 1
    return output_line

some test :

test1 = replace_with_pattern("xyyx", "abbacdeffel", "1234")
test2 = replace_with_pattern("abbacdeffel", "abbacdeffel", "1234")
print(test1, test2)

=> 1234cd1234l 1234

3 Comments

This is great and does exactly what I needed it to do. I just wasnt expecting it to be this long
My solution is shorter ;-)
@yibs : you can check it as correct answer then ;) And the code is not optimized (coded between 2 and 3h in middle of a couldn't-sleep night ;)
1

Here goes my attempt:

([a-zA-Z])(?!\1)([a-zA-Z])\2\1

Assuming you want to match letters only (if other ranges, change both [a-zA-Z] as appropriate, we have:

([a-zA-Z])

Find the first character, and note it so we can later refer to it with \1.

(?!\1)

Check to see if the next character is not the same as the first, but without advancing the search pointer. This is to prevent aaaa being accepted. If aaaa is OK, just remove this subexpression.

([a-zA-Z])

Find the second character, and note it so we can later refer to it with \2.

\2\1

Now find the second again, then the first again, so we match the full abba pattern.

And finally, to do a replace operation, the full command would be:

import re
re.sub(r'([a-zA-Z])(?!\1)([a-zA-Z])\2\1',
       '1234',
       'abbacdeffelzzzz')

The r at the start of the regex pattern is to prevent Python processing the backslashes. Without it, you would need to do:

import re
re.sub('([a-zA-Z])(?!\\1)([a-zA-Z])\\2\\1',
       '1234',
       'abbacdeffelzzzz')

Now, I see the spec has expanded to a user-defined pattern; here is some code that will build that pattern:

import re

def make_re(pattern, charset):
    result = ''
    seen = []
    for c in pattern:
        # Is this a letter we've seen before?
        if c in seen:
            # Yes, so we want to match the captured pattern
            result += '\\' + str(seen.index(c)+1)
        else:
            # No, so match a new character from the charset,
            # but first exclude already matched characters
            for i in xrange(len(seen)):
                result += '(?!\\' + str(i + 1) + ')'
            result += '(' + charset + ')'
            # Note we have seen this letter
            seen.append(c)
    return result

print re.sub(make_re('xzzx', '\\d'), 'abba', 'abba1221b99999889')
print re.sub(make_re('xyzxyz', '[a-z]'), '123123', 'abcabc zyxzyyx zyzzyz')

Outputs:

abbaabbab9999abba
123123 zyxzyyx zyzzyz

1 Comment

can't understand the code :-( I have to work again regex :-p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.