Parallel Program in Python produces no output

Question

I have a simple task. A particular function needs to be run for a large number of files. This task can be easily parallelized.

Here is the working code:

# filelist is the directory containing two file, a.txt and b.txt.
# a.txt is the first file, b.xt is the second file
# I pass a file that lits the names of the two files to the main program

from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import sys

def translate(filename):
    print(filename)
    f = open(filename, "r")
    g = open(filename + ".x", , "w")
    for line in f:
        g.write(line)

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            executor.submit(translate, "filelist/" + filename)
        for future in as_completed(futures):
            future.result()

if __name__ == "__main__":
     main(sys.argv[1])

However, no new files are created, i.e. the folder doesn't contain the a.txt.x and b.txt.x files.

What is wrong with the above code and how can i make it work?

Thanks.

Your program contains a print statement. Do you get any output? — Paul Cornelius
– Paul Cornelius, Commented Feb 4, 2019 at 3:58
Iterating over a file gives you the lines of the file, including the line ending - so you're trying to open files with names containing newlines, which are unlikely to actually exist. — jasonharper
– jasonharper, Commented Feb 4, 2019 at 4:02
Also you aren't closing the files. If you don't close "g" you may not get any output. If you open f and g using "with" statements they will be always be closed at the end of the block. — Paul Cornelius
– Paul Cornelius, Commented Feb 4, 2019 at 4:08
futures is still an empty list at the second loop in main — Aaron
– Aaron, Commented Feb 4, 2019 at 4:10

Aaron · Accepted Answer · 2019-02-04 05:38:19Z

This should get you on the right path. If it doesn't work and isn't an obvious bug, then I suspect you may not have all your file paths correct... I should point out that writing files would benefit from threads more than processes from the reduced overhead. File I/O should release the GIL so you'll benefit from the speedup (significantly more if you copy more than one line at a time.) That said, if you're just copying files, you should really just use shutil.copy or shutil.copy2

from concurrent.futures import ProcessPoolExecutor, wait
from pathlib import Path
import sys

def translate(filename):
    print(filename)
    with open(filename, "r") as f, open(filename + ".x", , "w") as g:
        for line in f:
            g.write(line)

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            futures.append(executor.submit(translate, "filelist/" + filename))
        wait(futures) #simplify waiting on processes if you don't need the result.
        for future in futures:
            if future.excpetion() is not None:
                raise future.exception() #ProcessPoolEcecutors swallow exceptions without telling you...
        print("done")

if __name__ == "__main__":
     main(sys.argv[1])

Collectives™ on Stack Overflow

Parallel Program in Python produces no output

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related