0

I am trying to follow a answer given here:

How to only read lines in a text file after a certain string using python?

in reading only the line after a certain phrase in which I went the boolean route, or the second answer.

I need to get just the numbers between a two opening and closing section from a file

<type>
1 
2
3
<type>

However when I used this code:

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

I can't get skip the first line and get :

'<type>', '1','2','3'

Where I just want

'1','2','3'

while ending the appending to the list when I hit as I don't need that in my list

I'm not sure what I'm doing wrong and can't ask on the page because my rep isn't high enough.

4
  • Why not use xml with python? Commented Feb 15, 2016 at 19:13
  • This may look like xml script I am handling but it is for a molecular dynamics simulations script that has over 50000 lines that are separated by these headers. I need a quick way to grab certain sections and then append them to new files Commented Feb 15, 2016 at 19:27
  • stackoverflow.com/questions/34571288/… stackoverflow.com/questions/31507045/… Commented Feb 15, 2016 at 19:49
  • 1
    @PadraicCunningham I saw a similar one using that module. I will take a look at that one in more depth later. Thanks for your response Commented Feb 15, 2016 at 20:00

2 Answers 2

1

You have to skip the rest of the for loop after detecting the "header". In your code, you're setting found_type to True and then the if found_type: check matches.

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
            continue                    # This is the only change to your code.
                                        # When the header is found, immediately go to the next line
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for explanation! @Jasper
0

The simplest approach is a double loop with yield:

def section(fle, begin, end):
    with open(fle) as f:
        for line in f:
            # found start of section so start iterating from next line
            if line.startswith(begin):
                for line in f: 
                    # found end so end function
                    if line.startswith(end):
                        return
                    # yield every line in the section
                    yield line.rstrip()     

Then just either call list(section('test.xml','<type>','</type>')) or iterate over for line in section('test.xml','<type>','</type>'):use lines,if you have repeating sections then swap the return for a break. You also don't need to call str on the lines as they are already strings, if you have a large file then the groupby approach in the comments might be a better alternative.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.