1

I have a text string that contains numbers. Plus, I have a number list. I want to replace the numbers inside the string with numbers from the list in the order of the string and list.

By using a Regex I extracted from the string the existing numbers and passed them to the list as well and now I have a match between the original number and the alternate number. However, it is still unclear how I can find adjustments and make replacements in order.

with this line, I extract the numbers from the given string:

    list_of_numbers_in_string = [int(x) for x in re.findall('\d+', str)]

And now I wonder how it can be used, or another method to get the desired result and from this input:

    data = 'readingOrder {index:24;} person {offset:0; length:7;} textStyle {offset:0; length:7; underlined:true;} place {offset:52; length:8;} textStyle {offset:52; length:8; underlined:true;}'
    new_numbers = [24, 0, 12, 0, 12, 58, 14, 58, 14]

get this output:

    corrected_data = 'readingOrder {index:24;} person {offset:0; length:12;} textStyle {offset:0; length:12; underlined:true;} place {offset:58; length:14;} textStyle {offset:58; length:14; underlined:true;}'

3 Answers 3

1

The accepted answer is actually incorrect (it is now deleted). data.replace() will replace the first occurrence of the number, which is not always the correct one. For example, when you try to replace 8 with 14, it actually replaces 58 with 514.

Here is my solution:

import re

data = 'readingOrder {index:24;} person {offset:0; length:7;} textStyle {offset:0; length:7; underlined:true;} place {offset:52; length:8;} textStyle {offset:52; length:8; underlined:true;}'
new_numbers = [24, 0, 12, 0, 12, 58, 14, 58, 14]

offset = 0
for index, match in enumerate(re.finditer('\d+', data)):
    data = data[:match.start() + offset] + str(new_numbers[index]) + data[match.end() + offset:]
    offset += len(str(new_numbers[index])) - match.end() + match.start()
Sign up to request clarification or add additional context in comments.

1 Comment

This is perfect! Except that the time complexity is O(len(new_numbers) * len(data)). If the time complexity is not an issue, then this is the solution to go with.
0

If you directly operate on data (as a string) with one new number at a time (i.e. in a for loop you do data = operate on data), then it is likely that the time complexity will be O(len(new_numbers) * len(data)).

One efficient way of doing it in O(len(data)) time is to operate on a list of characters:

def replace_numbers(data, new_numbers):
    new_numbers_idx = 0
    data_as_char_list = []

    skip = False

    for data_ch in data:
        if data_ch.isdigit():
            if not skip:
                # When we encounter any number's first digit in data, we will add the new number, and in the next iterations we will skip rest of the digits in data.
                # e.g. data = 'hi123hi', new_numbers = [444], then when we encounter `1` we will add ['4', '4', '4'] and skip the rest of the digits '2' and '3' from data by setting skip = True.
                new_number_as_char_list = list(str(new_numbers[new_numbers_idx]))
                data_as_char_list.extend(new_number_as_char_list)
                new_numbers_idx += 1
                skip = True
        else:
            data_as_char_list.append(data_ch)
            skip = False

    return ''.join(data_as_char_list)


data = 'readingOrder {index:24;} person {offset:0; length:7;} textStyle {offset:0; length:7; underlined:true;} place {offset:52; length:8;} textStyle {offset:52; length:8; underlined:true;}'
new_numbers = [24, 0, 12, 0, 12, 58, 14, 58, 14]
data = replace_numbers(data, new_numbers)

corrected_data = 'readingOrder {index:24;} person {offset:0; length:12;} textStyle {offset:0; length:12; underlined:true;} place {offset:58; length:14;} textStyle {offset:58; length:14; underlined:true;}'
assert data == corrected_data

Comments

0

alternative

import re

data = 'readingOrder {index:24;} person {offset:0; length:7;} textStyle {offset:0; length:7; underlined:true;} place {offset:52; length:8;} textStyle {offset:52; length:8; underlined:true;}'
new_numbers = [24, 0, 12, 0, 12, 58, 14, 58, 14]
x = re.findall("\d+", data)
data = data.replace("{","{{").replace("}","}}")
for n in x:
    data = data.replace(n,"{}",1)
data = data.format(*new_numbers)
print(data)

[Out]:

readingOrder {index:24;} person {offset:0; length:12;} textStyle {offset:0; length:12; underlined:true;} place {offset:58; length:14;} textStyle {offset:58; length:14; underlined:true;}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.