How to extract the text between based on start and end list

Question

I am trying to extract the text between a list of items based on two separate lists.

For example 
start = ['intro','Intro','[intro','Introduction',(intro)]
end = ['P1','P2','[P1','[P2']

input:
intro
L1
L2
P1
L3
L4
[intro]
L5
L6

Expected Output:
L1
L2
L5
L6

How can I achieve this, Having tried

text = 'I want to find a string between two substrings'
start = 'find a '
end = 'between two'

print(text[text.index(start)+len(start):text.index(end)])

I want my output based on Example 1

can you explain it properly its hard to understand what you want — Ashish
– Ashish, Commented Apr 16, 2019 at 16:20
Your example code text[text.index(start)+len(start):text.index(end)] should output "string". Are you expecting a different output? Also, how does that example relate to the list of input and output posted above it? — benvc
– benvc, Commented Apr 16, 2019 at 16:24
@benvc, I believe start and end are lists as described in first example. — Austin
– Austin, Commented Apr 16, 2019 at 16:25

icwebndev · Accepted Answer · 2019-04-16 17:13:12Z

2

Quick and dirty example based on your second example:

text = 'I want to find a string between two substrings'
start = 'find a '
end = 'substrings'

s_idx = text.index(start) + len(start) if start in text else -1

e_idx = text.index(end) if end in text else -1

if s_idx > -1 and e_idx > -1:
    print(text[s_idx:e_idx])

You have to check if substring is a part of a string or else str.index() throws a ValueError.

EDIT: Output based on first example:

start_list = ["work", "start", "also"]
end_list = ["of", "end", "substrings"]
text = "This can also work on a list of start and end substrings"

print("* Example with a list of start and end strings, stops on a first match")
print("- Text: {0}".format(text))
print("- Start: {0}".format(start_list))
print("- End: {0}".format(end_list))

s_idx = -1
for string in start_list:
    if string in text:
        s_idx = text.index(string) + len(string)
        # we're breaking on a first find.
        break

e_idx = -1
for string in end_list:
    if string in text:
        e_idx = text.index(string)
        # we're breaking on a first find.
        break

if e_idx > -1 and s_idx > -1:
    print(text[s_idx:e_idx])

Or, if you even want to go further and find all substrings between all occurrences:

print("* Example with a list of start and end strings, finds all matches")
print("- Text: {0}".format(text))
print("- Start: {0}".format(start_list))
print("- End: {0}".format(end_list))

s_idxs = []
e_idxs = []

for string in start_list:
    if string in text:
        s_idxs.append(text.index(string) + len(string))

for string in end_list:
    if string in text:
        e_idxs.append(text.index(string))


for s_idx in s_idxs:
    for e_idx in e_idxs:
        if e_idx <= s_idx:
            print("ignoring end index {0}, it's before our start at {1}!".format(e_idx, s_idx))
            # end index is lower than start index, ignoring it.
            continue

        print("{0}:{1} => {2}".format(s_idx, e_idx, text[s_idx:e_idx]))

You can further 'shorten' and improve this code, this is just a quick and dirty write up.

edited Apr 16, 2019 at 17:13

answered Apr 16, 2019 at 16:26

icwebndev

4133 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

terry Over a year ago

My question is, if start and end are a list of words, how can I handle that situation

icwebndev Over a year ago

You'll have to iterate through that list and basically do same thing I did. If you need an example, I could provide one.

terry Over a year ago

Yes, can you provide an example

icwebndev Over a year ago

I have edited my answer and added two more examples.

Collectives™ on Stack Overflow

How to extract the text between based on start and end list

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related