0

I'm writing code in Python 2.7 (bigger framework compatibility reasons) that does the following:

  1. Takes an ID_something.py file, opens it, reads the lines from it, then closes the file.
  2. Creates an out_something.py file where output will be stored.
  3. For each line from ID_something.py it finds the ID string contained in it. If no ID string was found, further steps are not performed.
  4. Searches for occurences of that ID in a bunch of other files (module_aaa.py, module_bbb.py, module_ccc.py etc.) - this step is performed for multiple lines at once utilizing multiprocessing.Pool.
  5. Writes the number of occurences and the ID string itself into out_file and closes the file handle to out_something.py, if that was the last ID in the ID_something.py file.
  6. If no IDs were found ID_something.py (in none of its lines), then out_something.py might be empty - the program checks the filesize, and if it's 0, then attempts to delete the empty out_something.py

Here's the relevant code:

            #open ID file
            id_file = open(id_prefix+filename+".py","r") #open the ID file
            id_lines = id_file.readlines() #read its contents
            id_file.close()
            #write out file with the results
            if os.path.exists(out_prefix+filename+".py"):
                print "ERROR: file "+out_prefix+filename+".py already exists! Program must be stopped." #user must have created the file while the program was running! (or deletion of the file failed)
                exit(1) #error: out file would overwrite an existing file
            out_file = open(out_prefix+filename+".py","w") #open the out file
            print "Searching for IDs from "+(id_prefix if use_subfolders else "")+id_filename+"..."
            line_break = get_file_comment(filename) #this is what we add before the next line, where each line is the id and its occurences, so before the 1st line print a general comment about the whole file (or just make it an empty string)
            #!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#
            #execute searches for IDs in the file
            #prepare a list of arguments
            list_of_args = [(id_line,filename,temp_path,temp_match_prefix,temp_prefix) for id_line in id_lines]
            #assign work to a pool of worker processes
            out_lines = multiprocessing.Pool().map(process_id_occurrences,list_of_args)
            for out_line in out_lines:
                if not out_line=="":
                    out_file.write(line_break+out_line)
                    line_break = "\n" #will be added before next line
            #@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#
            out_file.close()
            if os.path.getsize(out_prefix+filename+".py")==0: #we can't refer to out_file variable, we have to refer to the actual file on the disk
                overwrite_prev_prnt = True
                os.remove(out_prefix+filename+".py") #<-- THIS LINE CAUSES AN ERROR

The problem I'm encountering is WindowsError: [Error 32]. The empty file can't be deleted, because another process is using it. However, out_file.close() is already called above. Suspecting that it's one of the child worker processes holding the file, I tried manually controlling and closing the Pool, by adding pool = multiprocessing.Pool() right below #!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!# and pool.close() just above #@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#, and then replacing multiprocessing.Pool().map(process_id_occurrences,list_of_args) with pool.map(process_id_occurrences,list_of_args) but it changed nothing. I also tried to further tweak it by using pool.join() but that in turn caused an AssertionError:

  File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 474, in join
    assert self._state in (CLOSE, TERMINATE)

I also suspected that there could simply be some kind of delay on the Windows side, so also tried modifying the code so that instead of deleting the file right away it would just collect its path, and then delete it shortly before the program closes, giving Windows some extra miliseconds to do its thing, but it also didn't help the WindowsError (still Error 32).

I also tried replacing os.remove(out_prefix+id_matches.group(1)+".py") with:

                while os.path.exists(out_prefix+id_matches.group(1)+".py"):
                    try: os.remove(out_prefix+id_matches.group(1)+".py")
                    except: pass

to check what's going on, and indeed the file is locked by a whole bunch of python.exe processes.

What am I doing wrong? Why are those processes still running? I did everything I could to ensure that they are closed. What's going on?

1 Answer 1

0

I found out what the problem was - my incomplete understanding on how Pool works.

I was trying to pool.close() OR pool.join()

The solution is to do both, in the right order:

pool.close()
pool.join()

This ensured that child processes (workers of the pool) terminated properly before the program finished, and released the file in question.

If the higher ranking people decide to delete this question, please feel free.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.