I'm writing code in Python 2.7 (bigger framework compatibility reasons) that does the following:
- Takes an ID_something.py file, opens it, reads the lines from it, then closes the file.
- Creates an out_something.py file where output will be stored.
- For each line from ID_something.py it finds the ID string contained in it. If no ID string was found, further steps are not performed.
- Searches for occurences of that ID in a bunch of other files (module_aaa.py, module_bbb.py, module_ccc.py etc.) - this step is performed for multiple lines at once utilizing multiprocessing.Pool.
- Writes the number of occurences and the ID string itself into out_file and closes the file handle to out_something.py, if that was the last ID in the ID_something.py file.
- If no IDs were found ID_something.py (in none of its lines), then out_something.py might be empty - the program checks the filesize, and if it's 0, then attempts to delete the empty out_something.py
Here's the relevant code:
#open ID file
id_file = open(id_prefix+filename+".py","r") #open the ID file
id_lines = id_file.readlines() #read its contents
id_file.close()
#write out file with the results
if os.path.exists(out_prefix+filename+".py"):
print "ERROR: file "+out_prefix+filename+".py already exists! Program must be stopped." #user must have created the file while the program was running! (or deletion of the file failed)
exit(1) #error: out file would overwrite an existing file
out_file = open(out_prefix+filename+".py","w") #open the out file
print "Searching for IDs from "+(id_prefix if use_subfolders else "")+id_filename+"..."
line_break = get_file_comment(filename) #this is what we add before the next line, where each line is the id and its occurences, so before the 1st line print a general comment about the whole file (or just make it an empty string)
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#
#execute searches for IDs in the file
#prepare a list of arguments
list_of_args = [(id_line,filename,temp_path,temp_match_prefix,temp_prefix) for id_line in id_lines]
#assign work to a pool of worker processes
out_lines = multiprocessing.Pool().map(process_id_occurrences,list_of_args)
for out_line in out_lines:
if not out_line=="":
out_file.write(line_break+out_line)
line_break = "\n" #will be added before next line
#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#
out_file.close()
if os.path.getsize(out_prefix+filename+".py")==0: #we can't refer to out_file variable, we have to refer to the actual file on the disk
overwrite_prev_prnt = True
os.remove(out_prefix+filename+".py") #<-- THIS LINE CAUSES AN ERROR
The problem I'm encountering is WindowsError: [Error 32].
The empty file can't be deleted, because another process is using it.
However, out_file.close() is already called above.
Suspecting that it's one of the child worker processes holding the file, I tried manually controlling and closing the Pool, by adding pool = multiprocessing.Pool() right below #!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!# and pool.close() just above #@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#, and then replacing multiprocessing.Pool().map(process_id_occurrences,list_of_args) with pool.map(process_id_occurrences,list_of_args) but it changed nothing.
I also tried to further tweak it by using pool.join() but that in turn caused an AssertionError:
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 474, in join
assert self._state in (CLOSE, TERMINATE)
I also suspected that there could simply be some kind of delay on the Windows side, so also tried modifying the code so that instead of deleting the file right away it would just collect its path, and then delete it shortly before the program closes, giving Windows some extra miliseconds to do its thing, but it also didn't help the WindowsError (still Error 32).
I also tried replacing os.remove(out_prefix+id_matches.group(1)+".py") with:
while os.path.exists(out_prefix+id_matches.group(1)+".py"):
try: os.remove(out_prefix+id_matches.group(1)+".py")
except: pass
to check what's going on, and indeed the file is locked by a whole bunch of python.exe processes.
What am I doing wrong? Why are those processes still running? I did everything I could to ensure that they are closed. What's going on?