0

I need help in reading these textfiles, somehow when i do a recursive loop, the other loop always gets reset to the 1st line.

import sys
import codecs # there are actually more utf-8 char to the lines, so i needed codecs
reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("txtfile1", 'r', 'utf-8')
reader2 = codecs.open("txtfile2", 'r', 'utf-8')

for row in reader:
    print row[0:11] # here the outer loops is running the cycles
    for row2 in reader2:
        print row[0:11] # here the outer loops gets resets
        if row[0:11]==row2[0:11]:
            print row[12:] + row2[12:]

The textfiles look like these:

txtfile1

95032302317 foo
95032302318 bar
95032302319 bron
95032302320 cow
95032302321 how 
95032302322 now
95032303001 lala
95032303002 lili

txtfile2

95032103318 bar (in another utf8 language)
95032103319 bron (in another utf8 language)
95032103320 cow (in another utf8 language)
95032103321 how (in another utf8 language)
95032103322 now (in another utf8 language)
95032103323 didi
95032103324 dada
95032103325 kaka
5
  • May be print row2[0:11] # here the outer loops gets resets Or you really want to print row[0:11] in both loops? Commented Jun 17, 2011 at 8:38
  • It is not entirely clear what you want to do, but it seems that your solution is suboptimal. Depending on what you actually want to do, you probably want to use map(), zip() or if row in lines2 instead of two for loops... Commented Jun 17, 2011 at 8:43
  • @DrTyrsa: i need to print row[0:11] in the inner loop, but the outer loop gets reset, whenever i'm in the inner loop Commented Jun 17, 2011 at 8:48
  • What task are you trying to solve? Commented Jun 17, 2011 at 8:49
  • @Kimvais, i need to check whether the number the in txt1 is also in txt2. if they are, then i will need to print out the lines they have Commented Jun 17, 2011 at 8:49

3 Answers 3

1

Can't tell you why but this can be fixed by simply replacing for row in reader: with for row in reader.readlines():. If everything can't be imported at once, then you'll probably need to handle iteration manually.

EDIT

I just realized I did something slightly different in getting this to work:

outer = codecs.open(<outer loop file).readlines()
inner = codecs.open(<inner loop file).readlines()

for o in outer:
   for i in inner:
       print o
Sign up to request clarification or add additional context in comments.

3 Comments

nope. i've tried that, the same resetting of the outer loop happens
what is REPL? i'm using python 2.6.5 from linux terminal
YESH, it works. declaring the readlines() outside the loop seems to prevent the reset. but why did that happen, it's sort of a mystery.
1

This code snarfs the files into memory, but would probably work for you if your files are smaller than a few hundred megs.

#!/usr/bin/python2 -S
# -*- coding: utf-8 -*-
# vim:ts=4:sw=4:softtabstop=4:smarttab:expandtab

import sys
sys.setdefaultencoding("utf-8")
import site

import codecs

t1 = {}
t2 = {}

with codecs.open("txtfile1", 'r', 'utf-8') as reader:
    for row in reader:
        number, text = row.split(" ", 1)
        t1[number] = text

with codecs.open("txtfile2", 'r', 'utf-8') as reader:
    for row in reader:
        number, text = row.split(" ", 1)
        t2[number] = text

common = set(t1.keys()) & set(t2.keys())

while common:
    key = common.pop()
    print t1[key], t2[key]

2 Comments

seems like popping it out of the memory buffer might work. i ran into an error. Also, the number and the text are separated by <tab> not space and it is a sentence, not a single word.Traceback (most recent call last): File "kyoutta.py", line 22, in <module> number, text = row.split(" ", 1) ValueError: need more than 1 value to unpack
well, it worked with your example texts. I'm sure you'll have to tweak it for your actual text.
1

I'd simply do it like this:

row2 = reader2.readlines()
for row in reader.readlines():
    print row
    if row in row2:
        print 'yeah'

EDIT: new solution:

row2 = [line[:11] for line in reader2.readlines()]
for row in reader.readlines():
    print row
    if row[:11] in row2:
        print 'yeah'

2 Comments

thanks naeg, that worked if it is in the same language. but now it is because it is 2 different files in 2 different human languages.
thanks naeg, this works too. there is something with python loops that reset reader.read fucntion, readline() is more apt.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.