On linux using Python 2.7 multiprocessing.Process and logging, how/why is logging sequential?

Question

I have a queue work delagator that spawns 15-25 subprocesses via a custom multiprocessing.Pool(). The individual workers emit 1-3 logging.info messages at 10-15 messages in less then 1000ms and I've noticed that the timestamp is always sequential and never collides with other messages. This suggests to me that there is a shared lock somewhere in multiprocessing or logging but I can't figure out where it is exactly.

This is mostly asked for educational purposes as the software in question is going to be refactored to be async or multithreaded as 90% of real time is in IO ( remote api and not number crunching ).

Logging configuration mirrors Django's as I liked how that was organized:

LOGGING['handlers']['complex_console'] = {'level':'DEBUG', 
    'class':'logging.StreamHandler',         
    'formatter':'complex'
}

LOGGING['loggers']['REDACTED_sync'] = {
'handlers': ['complex_console'],
'propagate': True,
'level':'DEBUG'
}

Some quick clarification, multiprocessing.Process does use fork but calls to logging.getLogger() are not made until after a child-process is spawned.

Although I don't think it is related to the logger, I've used logging numerous times with multiprocessing, always use the same logger and I do receive messages from different processes interlaced with one another. You can try to use a single logger instance (may or may not help), or, post your spawning code for a better look. — micromoses
– micromoses, Commented Oct 13, 2013 at 23:19

Aida Paul · Accepted Answer · 2013-10-13 22:11:28Z

2

I have no idea how your spawner works, but if it follows a very simple schema:

logger = get_logger() # Gets logger instance)

class Foo:
    #Definition

while(True):
    #A loop that spawns processes

Then they will share the same instance of logger, hence why you get "sequential" writes. It doesn't impose any locking that you would notice performance wise. And that is for a very simple reason - appending to a log to file is extremely fast and is almost always finished before this file is needed again.

You can make an experiment and remove file handler from logging and you will notice that they are still sequential as it is very rare occurrence for two processes, even doing exactly the same, to finish at the same time.

answered Oct 13, 2013 at 22:11

Aida Paul

2,72020 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

David Over a year ago

Debug/info emits are going to stdout at anywhere from 5 to 10 messages inside of 500ms for ~4 hours but I've yet to see even one line get mangled. logging configuration is done pre-fork but the majority of all getLoger calls are after POpen forks.

Aida Paul Over a year ago

I don't think that I understand what is your actual issue, that time-stamp is sequential? If it is then it is working as intended and I don't understand WHY do you expect it to not be? The timestamp that shows in logging is a timestamp logging.info() call, which also locks logfile and writes to it. This is how it is implemented and this is why timestamps are sequential.

David Over a year ago

Concern I have is that logging.StreamHandler is somehow keeping all of forked child processes calls in order, suggesting there's a lock somewhere. I've put \b and \e characters in the formatter to see if either were on a line more than once or missing, no luck for cycles that output 20K log entries to stdout via python my_program.py | tee log_file.py

Aida Paul Over a year ago

@David You didn't show us code and you expect us to debug it? It very well may be how you designed it to work, can't say without code and you seem extremely reluctant to show it. I already did explain to you how locking in logger works, and it's not how you are concerned about it.

Fred Mitchell · Accepted Answer · 2013-10-13 22:11:45Z

0

If you only have one processor, then the behavior that you get should be expected.

answered Oct 13, 2013 at 22:11

Fred Mitchell

2,1712 gold badges21 silver badges29 bronze badges

4 Comments

David Over a year ago

Sorry, re-rechecked and the machine in question has at least 4 logical cores though it's AWS so that might not mean much. Instance type is m1.xlarge specifically.

Aida Paul Over a year ago

This has nothing to do with amount of processors or even threads as you can have multiprocessing even at 1 logical thread. Down-voted.

Fred Mitchell Over a year ago

The OP claims to be using multiprocessing, not threading. You are right that you could run multiple threads on a processor. But only one process can be processing at a time on a single processor system. So the logging could reasonably be expected to not overlap.

Aida Paul Over a year ago

@FredMitchell i don't know how to comment that, really. Please go ahead and read about multithreading/processing systems and how is that resolved on kernel side of things (and why you can actually have more than 1 application running on your PC at the same time).

Collectives™ on Stack Overflow

On linux using Python 2.7 multiprocessing.Process and logging, how/why is logging sequential?

2 Answers 2

4 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related