1

I have a long text, and I am trying to remove multiple substrings using start and end index. The problem here is that when I remove the first substring from the original text, the rest of the start and end indexes will be invalid. what is the most efficient way to do this?

def remove_substrings(text, indexes):
    '''
        indexes is a list containing start and end indexes.
        indexes = ["3 5", "7 8"]
    '''

    return text

3 Answers 3

1

Instead of removing substrings from left to right, remove them from right to left. This approach will ensure unchanged indices on the left side. Though it will solve your problem, there can be more efficient ways to do that.

Sign up to request clarification or add additional context in comments.

Comments

1

Dont remove the substring right away, instead, use another variable and append substrings to it.

def remove_substrings(text, indexes):
    '''
        indexes is a list containing start and end indexes.
        indexes = ["3 5", "7 8"]
    '''
    newText = ""
    i = 0

    for index in indices:
        j = int(index[0])
        newText += text[i:j]
        i = int(index[-1]) +  1
    return newText

Example for above code-
text: 'abcdef....z'(all alphabets of english)
indices: ["3 5", "7 8"]
expected output : all alphabets except 'd,e,f'(index:3 to 5) and 'h,i'(index 7 to 8).

The string newText will append values from a upto c in first iteration, then g in 2nd and so on.

Comments

1
def remove_substrings(text, indexes):
    '''
        indexes is a list containing start and end indexes.
        indexes = ["3 5", "7 8"]
    '''
    int_indexes = []
    for idx in indexes:
        s1,s2 = idx.split()
        int_indexes.append([int(s1), int(s2)])
    int_indexes.sort()
    int_indexes.reverse()
    for idx in int_indexes:
        text = text[0:idx[0]] + text[idx[1]+1:]
    return text

If your indexes are not sorted you will have to convert them to integer first. try with

text = "0123456789012345"
print(remove_substrings(text, ["3 5", "10 12", "7 8"]))

The first for loop can be replaced by list commprehension as follows:

int_indexes = [[int(idx.split()[0]),int(idx.split()[1])] for idx in indexes]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.