6

I can't see the problem here and it is driving me insane. I'm looping through 2 text files. Some lines in each file match and some don't. What I am doing is looping over file1. For each line in that file, loop over file2 and compare each element to see if they are the same. What's happening is my loop is stopping after the first loop through file1. Here is my code:

while f < 50:
    for line in file1:
        for name in file2:
            if name == line:
                print 'a match was found'
    f+=1

The while loop comes from somewhere else but it is working fine. I just included it for context. The problem is file1 only gives me the first line, compares it to all of the 'names' in file2 then stops instead of repeating the process for the next line in file1. Am I missing something glaringly obvious?

EDIT: If I put a print statement in after the first for loop and comment out the other for loop it loops through the whole first file

4
  • 1
    It's worth noting that for this to work as it appears to be intended, f+=1 needs to be indented one level - I presume that that is a copying error. Commented Jul 24, 2012 at 16:18
  • @Lattyware correct on the copying error thanks for pointing that out Commented Jul 24, 2012 at 16:19
  • You are comparing all lines in both files 50 times? I thought you wanted to find 50 matches.. Commented Jul 24, 2012 at 16:22
  • @MartijnPieters it's part of a statistical program. I need to know how many lines are between each match in file1 so it is necessary to compare every line in file1 with every line in file2. I'm not worried about performance here as the program is for my own use only. I just need to make sure I get the correct output Commented Jul 24, 2012 at 16:25

4 Answers 4

12

You cannot loop through a file and then loop through the same file again without seeking to the start.

Either re-open file2, call .seek(0) on file2 or load all lines into a list and loop over that instead.

In your specific case, using a set for the names is probably going to be the fastest:

names = set(name.strip() for name in file2)
while f < 50:
    for line in file1:
        if line.strip() in names:
            f += 1

You can do the same with the lines in file1 and do a set intersection, provided that lines are unique in both file1 and file2.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for that. I have been going nuts over this! I know it isn't the most efficient code but its only for testing purposes so I just need it to work. Thank you
It's unclear if you're just trying to compare matching lines or not. If you want to just compare matching lines, you probably want to just use zip(file_1, file2) and iterate over that.
@Julian I'm actually doing a count of each matching and unmatching line for statistical purposes so have to compare every line from each file
@mgilson: I am a smutz, that's why. Corrected.
4

The problem could be that once you've iterated over file2, it is exhausted so your inner for loop is not executing any longer (since there's nothing left in file2 to iterate over). You can close/reopen file2 each time through the loop, or you can seek back to the beginning before that loop is executed.

A slightly better approach would be to use sets (if the files aren't too big and you're not concerned about duplicates within a file or order):

matches = set(file1).intersection(file2)

This should read only file1 into memory and do the loop over file2 implicitly.

1 Comment

You don't need to create them both outright, you can just use set(file1).intersection(file2) and you only needed to create one set in memory.
3

After first time the inner loop is finished, the inner iterator over file2 reached the end so the solution is to point inner iterator of file2 to file's beginning each time, for example:

while f < 50:
    for line in file1:
        file2.seek(0, 0)
        for name in file2:
            if name == line:
                print 'match!'

Comments

0

Depending on the size of the files, you can use the readlines() function to read the lines of each file into a list.

Then, iterate over these lists. This will ensure that you do not have problems with the current position of the file position.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.