1

I have a simple task. A particular function needs to be run for a large number of files. This task can be easily parallelized.

Here is the working code:

# filelist is the directory containing two file, a.txt and b.txt.
# a.txt is the first file, b.xt is the second file
# I pass a file that lits the names of the two files to the main program

from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import sys

def translate(filename):
    print(filename)
    f = open(filename, "r")
    g = open(filename + ".x", , "w")
    for line in f:
        g.write(line)

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            executor.submit(translate, "filelist/" + filename)
        for future in as_completed(futures):
            future.result()

if __name__ == "__main__":
     main(sys.argv[1])

However, no new files are created, i.e. the folder doesn't contain the a.txt.x and b.txt.x files.

What is wrong with the above code and how can i make it work?

Thanks.

7
  • 1
    Your program contains a print statement. Do you get any output? Commented Feb 4, 2019 at 3:58
  • Iterating over a file gives you the lines of the file, including the line ending - so you're trying to open files with names containing newlines, which are unlikely to actually exist. Commented Feb 4, 2019 at 4:02
  • 2
    Also you aren't closing the files. If you don't close "g" you may not get any output. If you open f and g using "with" statements they will be always be closed at the end of the block. Commented Feb 4, 2019 at 4:08
  • futures is still an empty list at the second loop in main Commented Feb 4, 2019 at 4:10
  • @PaulCornelius yes, the print statement works fine Commented Feb 4, 2019 at 4:11

1 Answer 1

1

This should get you on the right path. If it doesn't work and isn't an obvious bug, then I suspect you may not have all your file paths correct... I should point out that writing files would benefit from threads more than processes from the reduced overhead. File I/O should release the GIL so you'll benefit from the speedup (significantly more if you copy more than one line at a time.) That said, if you're just copying files, you should really just use shutil.copy or shutil.copy2

from concurrent.futures import ProcessPoolExecutor, wait
from pathlib import Path
import sys

def translate(filename):
    print(filename)
    with open(filename, "r") as f, open(filename + ".x", , "w") as g:
        for line in f:
            g.write(line)

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            futures.append(executor.submit(translate, "filelist/" + filename))
        wait(futures) #simplify waiting on processes if you don't need the result.
        for future in futures:
            if future.excpetion() is not None:
                raise future.exception() #ProcessPoolEcecutors swallow exceptions without telling you...
        print("done")

if __name__ == "__main__":
     main(sys.argv[1])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.