Reading a file for a certain section in python

Question

I am trying to follow a answer given here:

How to only read lines in a text file after a certain string using python?

in reading only the line after a certain phrase in which I went the boolean route, or the second answer.

I need to get just the numbers between a two opening and closing section from a file

<type>
1 
2
3
<type>

However when I used this code:

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

I can't get skip the first line and get :

'<type>', '1','2','3'

Where I just want

'1','2','3'

while ending the appending to the list when I hit as I don't need that in my list

I'm not sure what I'm doing wrong and can't ask on the page because my rep isn't high enough.

This may look like xml script I am handling but it is for a molecular dynamics simulations script that has over 50000 lines that are separated by these headers. I need a quick way to grab certain sections and then append them to new files — l33tHax0r
– l33tHax0r, Commented Feb 15, 2016 at 19:27
stackoverflow.com/questions/34571288/… stackoverflow.com/questions/31507045/… — Padraic Cunningham
– Padraic Cunningham, Commented Feb 15, 2016 at 19:49
@PadraicCunningham I saw a similar one using that module. I will take a look at that one in more depth later. Thanks for your response — l33tHax0r
– l33tHax0r, Commented Feb 15, 2016 at 20:00

Jasper · Accepted Answer · 2016-02-15 19:44:38Z

1

You have to skip the rest of the for loop after detecting the "header". In your code, you're setting found_type to True and then the if found_type: check matches.

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
            continue                    # This is the only change to your code.
                                        # When the header is found, immediately go to the next line
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

answered Feb 15, 2016 at 19:44

Jasper

3,9371 gold badge20 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

l33tHax0r Over a year ago

Thanks for explanation! @Jasper

Padraic Cunningham · Accepted Answer · 2016-02-15 20:12:23Z

The simplest approach is a double loop with yield:

def section(fle, begin, end):
    with open(fle) as f:
        for line in f:
            # found start of section so start iterating from next line
            if line.startswith(begin):
                for line in f: 
                    # found end so end function
                    if line.startswith(end):
                        return
                    # yield every line in the section
                    yield line.rstrip()

Then just either call list(section('test.xml','<type>','</type>')) or iterate over for line in section('test.xml','<type>','</type>'):use lines,if you have repeating sections then swap the return for a break. You also don't need to call str on the lines as they are already strings, if you have a large file then the groupby approach in the comments might be a better alternative.

Collectives™ on Stack Overflow

Reading a file for a certain section in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related