Loops not working - Strings (Python)

Question

I need help in reading these textfiles, somehow when i do a recursive loop, the other loop always gets reset to the 1st line.

import sys
import codecs # there are actually more utf-8 char to the lines, so i needed codecs
reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("txtfile1", 'r', 'utf-8')
reader2 = codecs.open("txtfile2", 'r', 'utf-8')

for row in reader:
    print row[0:11] # here the outer loops is running the cycles
    for row2 in reader2:
        print row[0:11] # here the outer loops gets resets
        if row[0:11]==row2[0:11]:
            print row[12:] + row2[12:]

The textfiles look like these:

txtfile1

95032302317 foo
95032302318 bar
95032302319 bron
95032302320 cow
95032302321 how 
95032302322 now
95032303001 lala
95032303002 lili

txtfile2

95032103318 bar (in another utf8 language)
95032103319 bron (in another utf8 language)
95032103320 cow (in another utf8 language)
95032103321 how (in another utf8 language)
95032103322 now (in another utf8 language)
95032103323 didi
95032103324 dada
95032103325 kaka

May be print row2[0:11] # here the outer loops gets resets Or you really want to print row[0:11] in both loops? — DrTyrsa
– DrTyrsa, Commented Jun 17, 2011 at 8:38
It is not entirely clear what you want to do, but it seems that your solution is suboptimal. Depending on what you actually want to do, you probably want to use map(), zip() or if row in lines2 instead of two for loops... — Kimvais
– Kimvais, Commented Jun 17, 2011 at 8:43
@DrTyrsa: i need to print row[0:11] in the inner loop, but the outer loop gets reset, whenever i'm in the inner loop — alvas
– alvas, Commented Jun 17, 2011 at 8:48
@Kimvais, i need to check whether the number the in txt1 is also in txt2. if they are, then i will need to print out the lines they have — alvas
– alvas, Commented Jun 17, 2011 at 8:49

cwallenpoole · Accepted Answer · 2011-06-17 09:15:56Z

1

Can't tell you why but this can be fixed by simply replacing for row in reader: with for row in reader.readlines():. If everything can't be imported at once, then you'll probably need to handle iteration manually.

EDIT

I just realized I did something slightly different in getting this to work:

outer = codecs.open(<outer loop file).readlines()
inner = codecs.open(<inner loop file).readlines()

for o in outer:
   for i in inner:
       print o

edited Jun 17, 2011 at 9:15

answered Jun 17, 2011 at 8:46

cwallenpoole

82.4k26 gold badges132 silver badges174 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

alvas Over a year ago

nope. i've tried that, the same resetting of the outer loop happens

alvas Over a year ago

what is REPL? i'm using python 2.6.5 from linux terminal

alvas Over a year ago

YESH, it works. declaring the readlines() outside the loop seems to prevent the reset. but why did that happen, it's sort of a mystery.

Keith · Accepted Answer · 2011-06-17 09:10:16Z

1

This code snarfs the files into memory, but would probably work for you if your files are smaller than a few hundred megs.

#!/usr/bin/python2 -S
# -*- coding: utf-8 -*-
# vim:ts=4:sw=4:softtabstop=4:smarttab:expandtab

import sys
sys.setdefaultencoding("utf-8")
import site

import codecs

t1 = {}
t2 = {}

with codecs.open("txtfile1", 'r', 'utf-8') as reader:
    for row in reader:
        number, text = row.split(" ", 1)
        t1[number] = text

with codecs.open("txtfile2", 'r', 'utf-8') as reader:
    for row in reader:
        number, text = row.split(" ", 1)
        t2[number] = text

common = set(t1.keys()) & set(t2.keys())

while common:
    key = common.pop()
    print t1[key], t2[key]

answered Jun 17, 2011 at 9:10

Keith

43.2k11 gold badges62 silver badges77 bronze badges

2 Comments

alvas Over a year ago

seems like popping it out of the memory buffer might work. i ran into an error. Also, the number and the text are separated by <tab> not space and it is a sentence, not a single word.Traceback (most recent call last): File "kyoutta.py", line 22, in <module> number, text = row.split(" ", 1) ValueError: need more than 1 value to unpack

Keith Over a year ago

well, it worked with your example texts. I'm sure you'll have to tweak it for your actual text.

naeg · Accepted Answer · 2011-06-17 09:10:33Z

1

I'd simply do it like this:

row2 = reader2.readlines()
for row in reader.readlines():
    print row
    if row in row2:
        print 'yeah'

EDIT: new solution:

row2 = [line[:11] for line in reader2.readlines()]
for row in reader.readlines():
    print row
    if row[:11] in row2:
        print 'yeah'

edited Jun 17, 2011 at 9:10

answered Jun 17, 2011 at 8:53

naeg

4,0123 gold badges26 silver badges29 bronze badges

2 Comments

alvas Over a year ago

thanks naeg, that worked if it is in the same language. but now it is because it is 2 different files in 2 different human languages.

alvas Over a year ago

thanks naeg, this works too. there is something with python loops that reset reader.read fucntion, readline() is more apt.

Collectives™ on Stack Overflow

Loops not working - Strings (Python)

3 Answers 3

EDIT

3 Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

EDIT

3 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related