7

When writing to an open file that I have shared via passing it to a worker function that is implemented using multiprocessing, the files contents are not written properly. Instead '^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^' is written to the file.

Why would this happen? Can you not have many multiprocessing units writing to the same file? Do you need to use a Lock? A Queue? Am I not using Multiprocessing correctly or effectively?

I feel like some example code might help, but please just refer to it as a reference of me opening a file and passing the open file via multiprocessing to another function that does writing on that file.

Multiprocessing file:

import multiprocessing as mp

class PrepWorker():
    def worker(self, open_file):
        for i in range(1,1000000):
            data = GetDataAboutI() # This function would be in a separate file
            open_file.write(data)
            open_file.flush()
        return

if __name__ == '__main__':
    open_file = open('/data/test.csv', 'w+')
    for i in range(4):
        p = mp.Process(target=PrepWorker().worker, args=(open_file,))
        jobs.append(p)
        p.start()

    for j in jobs:
        j.join()
        print '{0}.exitcode = {1}' .format(j.name, j.exitcode)   
    open_file.close()
10
  • "There are probably details in these code examples that are not needed." minimal reproducible example Commented Dec 29, 2015 at 7:30
  • Where do the "^@"'s come from? I cannot see anything like this in the code. Are these literals or a representation of control symbols? Commented Dec 29, 2015 at 7:33
  • @ivan_pozdeev I have no idea where the ^@ values are coming from... Every line that is written while running this, is written as those repeating symbols. If I change the range to 1 and just run 1 processor, the data is written perfectly. Commented Dec 29, 2015 at 7:46
  • @ccdpowell: what happens if the PrepWorkers each write a fixed character (determined at random by each worker)? Commented Dec 29, 2015 at 7:48
  • 1
    Based on @user's question, I ran the random string and was able to clarify the problem a little more. Each ^@ is written where there should be a character written for every process EXCEPT the last one. In my Example, if I ran this with 4 processors, each processing 10 items, I would have a string of 30 '^@' followed by 10 readable characters. Commented Dec 29, 2015 at 7:59

1 Answer 1

5

Why would this happen?

There are several processes which possibly try to call

open_file.write(data)
open_file.flush()

at the same time. Which behavior would be fitting, in your eyes, if something like

  • a.write
  • b.write
  • a.flush
  • c.write
  • b.flush

happens?

Can you not have many multiprocessing units writing to the same file? Do you need to use a Lock? A Queue?

Python multiprocessing safely writing to a file recommends having one queue, which is the read by one process which writes to the file. So do Writing to a file with multiprocessing and Processing single file from multiple processes in python.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. This is what I needed. I was trying to do too much with the data the processes were overlapping writing in between flushing. This problem stemmed from a fundamental mis-understanding of how to structure multiprocessing jobs.
Why does using a Lock for preventing the write/flush from different processes interweaving not work? I distribute counters using the approach described here and figured the same approach should work for an open file. eli.thegreenplace.net/2012/01/04/…
@ccdpowell: The code at the website you linked to seems fine. Without looking at yours, it's hard to say.How about you ask a new question with the modified code? (Feel free to ping this comment if it's not immediately answered)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.