I want to write a library (a module) in Python for logging. The logger instance should be unique per process and global per process. (Meaning that it should not be passed around as an argument to different functions.)
I am having a bit of a mental block trying to design something sensible.
Let me try and explain my approach.
I started with a single process. I added some logging code to this process.
# process_1.py
from datetime import datetime
from datetime import timezone
import logging
import sys
logger_process_name = 'process_1'
logger_file_datetime = datetime.now(timezone.utc).date()
logger = logging.getLogger(__name__)
stdout_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')
stdout_log_handler = logging.StreamHandler(stream=sys.stdout)
stdout_log_handler.setLevel(logging.INFO)
stdout_log_handler.setFormatter(stdout_log_formatter)
file_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')
file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
file_log_handler.setLevel(logging.DEBUG)
file_log_handler.setFormatter(file_log_formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(stdout_log_handler)
logger.addHandler(file_log_handler)
def main():
logger.info('hello!')
logger.info('goodbye!')
if __name__ == '__main__':
main()
As you can see, most of this is just boilerplate code to initialize an instance of a logger object which writes both to stdout and a file, the filename for which is stamped with the current date. (In the UTC timezone.)
I then wrote a second process.
# process_2.py
from datetime import datetime
# ...
# ... skip copy & paste of same lines above relating to the logger ...
# ...
logger.addHandler(file_log_handler)
def main():
logger.info('hello from process_2!')
logger.info('goodbye from process_2!')
if __name__ == '__main__':
main()
Ah - duplicate code. An ideal candidate for a module, perhaps?
Let's try to write it.
# lib_logger.py
from datetime import datetime
from datetime import timezone
import logging
import sys
logger_process_name = 'process_1' # <--- !!! Wrong name !!!
logger_file_datetime = datetime.now(timezone.utc).date()
logger = logging.getLogger(__name__)
stdout_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')
stdout_log_handler = logging.StreamHandler(stream=sys.stdout)
stdout_log_handler.setLevel(logging.INFO)
stdout_log_handler.setFormatter(stdout_log_formatter)
file_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')
file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
file_log_handler.setLevel(logging.DEBUG)
file_log_handler.setFormatter(file_log_formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(stdout_log_handler)
logger.addHandler(file_log_handler)
Note that the process name is wrong, if we want to use this from process_2.py.
# process_2.py
from lib_logger import logger
def main():
logger.info('hello from process_2!') # <--- the wrong process name is printed here
logger.info('goodbye from process_2!') # <- and here
# it writes to the file `process_1_2024-08-22.log` not
# `process_2_2024-08-22.log` as it should
if __name__ == '__main__':
main()
As this point I became stuck and didn't manage to come up with a good solution.
Initially I thought it was fairly obvious how to solve this. Add a function to the logger module to initialize and return the logger instance, and add another function to register the name.
Here's how I tried to implement this:
# lib_logger.py
logger_process_name = None
logger = None
def set_logger_process_name(process_name: str) -> None:
global logger_process_name
logger_process_name = process_name
def get_logger() -> Logger:
global logger
if logger is not None:
return logger
if logger_process_name is None:
raise RuntimeError(f'process name not set, cannot create logger instance')
logger = logging.getLogger(__name__)
stdout_log_formatter = # ...
# skip some lines
file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
file_log_handler.setLevel(logging.DEBUG)
file_log_handler.setFormatter(file_log_formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(stdout_log_handler)
logger.addHandler(file_log_handler)
return logger
However, to actually use this from one of the processes, and other modules, is a disaster. The ordering of import statements suddenly becomes important.
# process_1.py
from lib_logger import get_logger
from lib_logger import set_logger_process_name
# might defer process name definitions to another module
from lib_process_names import PROCESS_NAME_PROCESS_1
set_logger_process_name(PROCESS_NAME_PROCESS_1)
log = get_logger()
# must initialize the module, or at least call `set_logger_process_name`
# before initializing any further modules which use the logger module
# (call `get_logger`)
# Why? Because cannot call `get_logger` before `set_logger_process_name`
# has been called
from example_module import example_function
Just to show what example_function might do:
# example_module.py
from lib_logger import get_logger()
log = get_logger()
def example_function():
log.info('started work in example_function...')
We could actually hack our way around this by having all functions call log = get_logger() before they start running, but this isn't a good solution. The log handle should be globally reachable within a module, each function shouldn't be responsible for obtaining a handle to a logger. That's not much improvement over passing around a logger instance as a function argument.