Python nested loop - get next N lines

Question

I'm new to Python and trying to do a nested loop. I have a very large file (1.1 million rows), and I'd like to use it to create a file that has each line along with the next N lines, for example with the next 3 lines:

Right now I'm just trying to get the loops working with rownumbers instead of the strings since it's easier to visualize. I came up with this code, but it's not behaving how I want it to:

with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f: 
for i, line in enumerate(f):
     line_a = i
     lower_bound = i + 1
     upper_bound = i + 4
     with open('C:/working_file.txt', mode='r', encoding = 'utf8') as g:
        for j, line in enumerate(g):
            while j >= lower_bound and j <= upper_bound:
                line_b = j
                j = j+1
                print(line_a, line_b)

Instead of the output I want like above, it's giving me this:

990     991
990     992
990     993
990     994
990     992
990     993
990     994
990     993
990     994
990     994

As you can see the inner loop is iterating multiple times for each line in the outer loop. It seems like there should only be one iteration per line in the outer loop. What am I missing?

EDIT: My question was answered below, here is the exact code I ended up using:

from collections import deque
from itertools import cycle
log = open('C:/example.txt', mode='w', encoding = 'utf8') 
try:
    xrange 
except NameError: # python3
    xrange = range

def pack(d):
    tup = tuple(d)
    return zip(cycle(tup[0:1]), tup[1:])

def window(seq, n=2):
    it = iter(seq)
    d = deque((next(it, None) for _ in range(n)), maxlen=n)
    yield pack(d)
    for e in it:
        d.append(e)
        yield pack(d)

for l in window(open('c:/working_file.txt', mode='r', encoding='utf8'),100):
    for a, b in l:
        print(a.strip() + '\t' + b.strip(), file=log)

for j, line in enumerate(g) and j = j+1 should never ever go together... — sashkello
– sashkello, Commented Dec 10, 2013 at 0:08
I don't see how else it can work - you are having a loop within a loop. Of course line_a stays the same for all your iterations through file g. — sashkello
– sashkello, Commented Dec 10, 2013 at 0:10
@sashkello Why should that not ever be done? What is the alternative? I just started learning python. — raspberry_door
– raspberry_door, Commented Dec 10, 2013 at 0:20
for i in mylist iterates over all objects within mylist. Modifying i at the same time makes the program confusing because i is not necessarily within the list any more. In your case you can do for n in range(lower_bound, upper_bound+1). — sashkello
– sashkello, Commented Dec 10, 2013 at 0:25

alko · Accepted Answer · 2013-12-10 00:44:52Z

5

Based on window example from old docs you can use something like:

from collections import deque
from itertools import cycle

try:
    xrange 
except NameError: # python3
    xrange = range

def pack(d):
    tup = tuple(d)
    return zip(cycle(tup[0:1]), tup[1:])

def window(seq, n=2):
    it = iter(seq)
    d = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield pack(d)
    for e in it:
        d.append(e)
        yield pack(d)

Demo:

>>> for l in window([1,2,3,4,5], 4):
...     for l1, l2 in l:
...         print l1, l2
...
1 2
1 3
1 4
2 3
2 4
2 5

So, basically you can pass your file to window to get desired result:

window(open('C:/working_file.txt', mode='r', encoding='utf8'), 4)

edited Dec 10, 2013 at 0:44

answered Dec 10, 2013 at 0:32

alko

48.7k12 gold badges99 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

lvc Over a year ago

+1 for itertools. This is much, much better than my solution with readlines, because it doesn't read the entire file into memory. But note that the OP seems to be using Python 3, so some of the code needs adjusting - xrange -> range jumps out, for example.

IceArdor Over a year ago

+1 for using recipes I've seen before instead of reinventing the wheel.

raspberry_door Over a year ago

This worked perfectly for me with a little tweaking - thank you! I updated my question with the exact code I used.

lvc · Accepted Answer · 2013-12-10 00:33:43Z

1

You can do this with slices. This is easiest if you read the whole file into a list first:

with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f: 
    data = f.readlines()

for i, line_a in enumerate(data):
    for j, line_b in enumerate(data[i+1:i+5], start=i+1):
        print(i, j)

When you change it to printing the lines instead of the line numbers, you can drop the second enumerate and just do for line_b in data[i+1:i+5]. Note that the slice includes the item at the start index, but not the item at the end index, so that needs to be one higher than your current upper bound.

edited Dec 10, 2013 at 0:33

answered Dec 10, 2013 at 0:28

lvc

35.3k10 gold badges76 silver badges100 bronze badges

Comments

Peter Gibson · Accepted Answer · 2013-12-10 01:17:14Z

1

Based on alko's answer, I would suggest using the window recipe unmodified

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

for l in window([1,2,3,4,5], 4):
    for item in l[1:]:
        print l[0], item

answered Dec 10, 2013 at 1:17

Peter Gibson

19.6k7 gold badges63 silver badges65 bronze badges

Comments

Paul Becotte · Accepted Answer · 2013-12-10 00:35:21Z

0

I think the easiest way to solve this problem would be to read your file into a dictionary...

my_data = {}
for i, line in enumerate(f):
    my_data[i] = line

After that is done you can do

for x in my_data:
    for y in range(1, 4):
        print my_data[x], my_data[x + y]

As written you are reading your million line file a million times for each line...

edited Dec 10, 2013 at 0:35

answered Dec 10, 2013 at 0:12

Paul Becotte

9,9973 gold badges36 silver badges43 bronze badges

5 Comments

raspberry_door Over a year ago

Thanks Paul - so am I correct that you're suggesting this? I get an error: f = open('C:/working_file.txt', mode='r', encoding = 'utf8') my_data = {} for i, line in f: my_data[i] = line for x in my_data: for y in range(1, 4): out.write(my_data[x] + " " + my_data[x + y]

Paul Becotte Over a year ago

What error did you get? I just re-read your code and realized you were using a print statement for output- I changed out.write to print

raspberry_door Over a year ago

Here's the error: Traceback (most recent call last): File "loop_test.py", line 20, in <module> for i, line in f: ValueError: too many values to unpack (expected 2)

Paul Becotte Over a year ago

I forgot the enumerate in the first loop. My apologies.

alko Over a year ago

converting to dict is unnecessary overhead, as you have to construct it, at least evaluating hashes for all the keys (i.e. for all the lines in file), while you can operate over list instead; moreover, loading all the file is also unnecessary overhead, and this approach combines those two.

M4rtini · Accepted Answer · 2013-12-10 00:41:18Z

0

Since this was quite a big file, you might not want to load it all in memory at once. So to avoid reading a line more than once this is what you do.

Make a list with N elements, where N is the amount of next lines to read.
- When you read the first line, add that to the first item in the list.
- Add the nest line to the first and second item.
- and so on for each line
When a item in that list reaches a length N, take it out and append it to the output file. And add a empty item at the end so you still have a list of N items.

This way you only need to read each line once, and you wont have to load the whole file in memory. You only need to hold, at max, N! lines in memory.

answered Dec 10, 2013 at 0:41

M4rtini

13.6k4 gold badges38 silver badges42 bronze badges

1 Comment

lvc Over a year ago

This is roughly what alko's itertools solution does, except it has O(N) memory usage instead of O(N!).

Collectives™ on Stack Overflow

Python nested loop - get next N lines

5 Answers 5

3 Comments

Comments

Comments

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related