1

I'm writing python script to read line from a input file and write a unique lines(if the same line is not already in output file) to output file. somehow, my scripts always append the first line of input file to output file even if the same line is already in output file. I can't figure out why this happens. can anyone know why and how do I fix this? thanks,

import  os

input_file= 'input.txt'
output_file = 'output.txt'

fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
    for line1 in fi:
       print line1
       for line2 in fo:
           print line2
           if line2 == line1:
               flag = True
               print('Found Match!!')
               break
       if flag == False:
           fo.write(line1)
       elif flag == True:
           flag == False
       fo.seek(0)
    fo.close()
    fi.close()
4
  • You are opening it in append mode. Do you think it is because of that ? Also, when you use with, you dont need to close explicitly. Commented Aug 30, 2015 at 3:53
  • So to be clear, you want to write lines from the input file to the output file only if they don't already exist in the output file; right? What if a line doesn't occur in the output file, but it occurs more than once in the input file? Commented Aug 30, 2015 at 4:08
  • my input files are already composed of unique lines but output file doesn't as output file get update by multiple input files. Commented Aug 30, 2015 at 4:40
  • How large are these files? Commented Aug 30, 2015 at 4:58

3 Answers 3

3

When you open a file in append mode, the file object position is at the end of the file. So the first time through, when it reaches for line2 in fo:, there aren't any more lines in fo, so that block is skipped, and flag is still true, so that first line is written to the output file. After that, you do fo.seek(0), so you are checking against the entire file for subsequent lines.

Sign up to request clarification or add additional context in comments.

1 Comment

I wasn't aware of "a"pend option moving the pointer to the end of file and that makes sense that why it didn't compare the at first line. Followed your suggestion, moving fo.seek(0) before for loop fixed my problem.
1

The answer by kmacinnis is right on as to why your code isn't working; you need to use mode 'r+' instead of 'a+', or else put fo.seek(0) at the beginning of the for loop instead of the end.

That said, there's a much better way to do this than reading the entire output file for every line of the input file.

def ensure_file_ends_with_newline(handle):
    position = handle.tell()

    handle.seek(-1, 2)
    handle_end = handle.read(1)
    if handle_end != '\n':
        handle.write('\n')

    handle.seek(position)


input_filepath = 'input.txt'
output_filepath = 'output.txt'

with open(input_file, 'r') as infile, open(output_file, 'r+') as outfile:
    ensure_file_ends_with_newline(outfile)

    written = set(outfile)

    for line in infile:
        if line not in written:
            outfile.write(line)
            written.add(line)

2 Comments

If the files could be enormous, holding the entire contents in memory would be a problem. That said, this is much better if you know you have small files to work with. Also, I didn't realize you could have multiple file objects in a single with statement, so I'm thrilled to learn that! Thanks.
@kmacinnis, I considered that, but if your files are that big, that's a HUGE cost in IO to be reading the entire file once for every line in the input file. If you are dealing with files so big they won't fit in memory, there are better ways to do this than doing all that IO.
0

Your flag was never set to False.

flag == True is an equality

flag = True is an assignment.

Try the latter.

import  os

input_file= 'input.txt'
output_file = 'output.txt'

fo = open(output_file, 'a+')
flag = False
with open(input_file, 'r') as fi:
    for line1 in fi:
       #print line1
       for line2 in fo:
           #print line2
           if line2 == line1:
               flag = True
               print('Found Match!!')
               print (line1,line2)
               break
       if flag == False:
           fo.write(line1)
       elif flag == True:
           flag = False
       fo.seek(0)

3 Comments

That doesn't fix the problem that he asked about, but it does fix another problem I noticed, which is that no lines after the first match are written to the output file.
I must have misunderstood his question. Ran the code and only noticed this second bug. Nice catch.
thanks for picking this up. I made a change as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.