0

This script is meant to read through a file and take in the number (numA) and the text next to it (sourceA). It then uses this and compares it to every other line in the file. If a match in "nums" is found but not in sources, it writes the num to a file along with the sources it appears in.

with open(sortedNums, "r")as sor:
for line in sor:
    NumsA, sourceA = line.split('####')
    for line in sor:
        if '####' in line:
            NumsB, sourceB = line.split('####')
            if (NumsA == NumsB) & (sourceA != sourceB):
                print("Found reused Nums")
                with open(reusedNums, 'a')as reused:
                    reused.write(NumsA + ' ' + sourceA + ' ' + sourceB)
            print ("setA: " + NumsA + ' ' + sourceA)
            print ("setB: " + NumsB + ' ' + sourceB)

Most of this is working except that it does the full inner loop but only the first iteration of the outer loop

3
  • 2
    You can't repeatedly loop over a file without resetting the read position. Add sor.seek(0). Commented Dec 12, 2016 at 16:12
  • Include sample from input file. Commented Dec 12, 2016 at 16:12
  • Also, & is not boolean and; that's the binary bitwise and operator. You want to use and. Commented Dec 12, 2016 at 16:16

1 Answer 1

1

You are trying to read twice from the same file. Files use a current position to determine what to read next, and iterating over the remaining lines in the inner loop, you moved that position all the way to the end.

You could 'fix' that by seeking back to the start of the file with:

sor.seek(0)

However, looping over the whole file for every line in that file is really inefficient. Use a dictionary to track if you have seen the same information on a previous line:

with open(sortedNums, "r")as sor, \
     open(reusedNums, 'a') as reused:
    seen = {}
    for line in sor:
        if not '####' in line:
            continue
        nums, source = line.rstrip().split('####')
        if nums in seen and seen[nums] != source:
            print("Found reused Nums")
            reused.write('{} {} {}\n'.format(nums, source, seen[nums]))
        seen[nums] = source

By storing data in a dictionary, you only have to loop over the file once.

Sign up to request clarification or add additional context in comments.

5 Comments

I'm new to python. Could you explain what seen[nums] = source does
That sets a key-value pair in the dictionary; see the Python tutorial.
Im getting an error on the last line. It follows the structure of dictionary[key] = value but gives back a syntax error. Might this be a python 3 issue? Or am I missing something?
@S.McGuire: I missed out a closing ) on the preceding line. Sorry about that, corrected now.
I should have been able to find that really. That's working now. Thanks for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.