0

I have a queue work delagator that spawns 15-25 subprocesses via a custom multiprocessing.Pool(). The individual workers emit 1-3 logging.info messages at 10-15 messages in less then 1000ms and I've noticed that the timestamp is always sequential and never collides with other messages. This suggests to me that there is a shared lock somewhere in multiprocessing or logging but I can't figure out where it is exactly.

This is mostly asked for educational purposes as the software in question is going to be refactored to be async or multithreaded as 90% of real time is in IO ( remote api and not number crunching ).

Logging configuration mirrors Django's as I liked how that was organized:

LOGGING['handlers']['complex_console'] = {'level':'DEBUG', 
    'class':'logging.StreamHandler',         
    'formatter':'complex'
}

LOGGING['loggers']['REDACTED_sync'] = {
'handlers': ['complex_console'],
'propagate': True,
'level':'DEBUG'
}

Some quick clarification, multiprocessing.Process does use fork but calls to logging.getLogger() are not made until after a child-process is spawned.

1
  • Although I don't think it is related to the logger, I've used logging numerous times with multiprocessing, always use the same logger and I do receive messages from different processes interlaced with one another. You can try to use a single logger instance (may or may not help), or, post your spawning code for a better look. Commented Oct 13, 2013 at 23:19

2 Answers 2

2

I have no idea how your spawner works, but if it follows a very simple schema:

logger = get_logger() # Gets logger instance)

class Foo:
    #Definition

while(True):
    #A loop that spawns processes

Then they will share the same instance of logger, hence why you get "sequential" writes. It doesn't impose any locking that you would notice performance wise. And that is for a very simple reason - appending to a log to file is extremely fast and is almost always finished before this file is needed again.

You can make an experiment and remove file handler from logging and you will notice that they are still sequential as it is very rare occurrence for two processes, even doing exactly the same, to finish at the same time.

Sign up to request clarification or add additional context in comments.

4 Comments

Debug/info emits are going to stdout at anywhere from 5 to 10 messages inside of 500ms for ~4 hours but I've yet to see even one line get mangled. logging configuration is done pre-fork but the majority of all getLoger calls are after POpen forks.
I don't think that I understand what is your actual issue, that time-stamp is sequential? If it is then it is working as intended and I don't understand WHY do you expect it to not be? The timestamp that shows in logging is a timestamp logging.info() call, which also locks logfile and writes to it. This is how it is implemented and this is why timestamps are sequential.
Concern I have is that logging.StreamHandler is somehow keeping all of forked child processes calls in order, suggesting there's a lock somewhere. I've put \b and \e characters in the formatter to see if either were on a line more than once or missing, no luck for cycles that output 20K log entries to stdout via python my_program.py | tee log_file.py
@David You didn't show us code and you expect us to debug it? It very well may be how you designed it to work, can't say without code and you seem extremely reluctant to show it. I already did explain to you how locking in logger works, and it's not how you are concerned about it.
0

If you only have one processor, then the behavior that you get should be expected.

4 Comments

Sorry, re-rechecked and the machine in question has at least 4 logical cores though it's AWS so that might not mean much. Instance type is m1.xlarge specifically.
This has nothing to do with amount of processors or even threads as you can have multiprocessing even at 1 logical thread. Down-voted.
The OP claims to be using multiprocessing, not threading. You are right that you could run multiple threads on a processor. But only one process can be processing at a time on a single processor system. So the logging could reasonably be expected to not overlap.
@FredMitchell i don't know how to comment that, really. Please go ahead and read about multithreading/processing systems and how is that resolved on kernel side of things (and why you can actually have more than 1 application running on your PC at the same time).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.