0

I want to write a library (a module) in Python for logging. The logger instance should be unique per process and global per process. (Meaning that it should not be passed around as an argument to different functions.)

I am having a bit of a mental block trying to design something sensible.

Let me try and explain my approach.

I started with a single process. I added some logging code to this process.

# process_1.py

from datetime import datetime
from datetime import timezone

import logging
import sys

logger_process_name = 'process_1'
logger_file_datetime = datetime.now(timezone.utc).date()

logger = logging.getLogger(__name__)

stdout_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')

stdout_log_handler = logging.StreamHandler(stream=sys.stdout)
stdout_log_handler.setLevel(logging.INFO)
stdout_log_handler.setFormatter(stdout_log_formatter)

file_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')

file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
file_log_handler.setLevel(logging.DEBUG)
file_log_handler.setFormatter(file_log_formatter)

logger.setLevel(logging.DEBUG)
logger.addHandler(stdout_log_handler)
logger.addHandler(file_log_handler)

def main():
    logger.info('hello!')
    logger.info('goodbye!')

if __name__ == '__main__':
    main()

As you can see, most of this is just boilerplate code to initialize an instance of a logger object which writes both to stdout and a file, the filename for which is stamped with the current date. (In the UTC timezone.)

I then wrote a second process.

# process_2.py

from datetime import datetime
# ...
# ... skip copy & paste of same lines above relating to the logger ...
# ...
logger.addHandler(file_log_handler)

def main():
    logger.info('hello from process_2!')
    logger.info('goodbye from process_2!')

if __name__ == '__main__':
    main()

Ah - duplicate code. An ideal candidate for a module, perhaps?

Let's try to write it.

# lib_logger.py

from datetime import datetime
from datetime import timezone

import logging
import sys

logger_process_name = 'process_1' # <--- !!! Wrong name !!!
logger_file_datetime = datetime.now(timezone.utc).date()

logger = logging.getLogger(__name__)

stdout_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')

stdout_log_handler = logging.StreamHandler(stream=sys.stdout)
stdout_log_handler.setLevel(logging.INFO)
stdout_log_handler.setFormatter(stdout_log_formatter)

file_log_formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s | %(process)d | %(message)s')

file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
file_log_handler.setLevel(logging.DEBUG)
file_log_handler.setFormatter(file_log_formatter)

logger.setLevel(logging.DEBUG)
logger.addHandler(stdout_log_handler)
logger.addHandler(file_log_handler)

Note that the process name is wrong, if we want to use this from process_2.py.

# process_2.py

from lib_logger import logger

def main():
    logger.info('hello from process_2!') # <--- the wrong process name is printed here
    logger.info('goodbye from process_2!') # <- and here
    # it writes to the file `process_1_2024-08-22.log` not
    # `process_2_2024-08-22.log` as it should

if __name__ == '__main__':
    main()

As this point I became stuck and didn't manage to come up with a good solution.

Initially I thought it was fairly obvious how to solve this. Add a function to the logger module to initialize and return the logger instance, and add another function to register the name.

Here's how I tried to implement this:

# lib_logger.py

logger_process_name = None
logger = None

def set_logger_process_name(process_name: str) -> None:
    global logger_process_name
    logger_process_name = process_name

def get_logger() -> Logger:
    global logger

    if logger is not None:
        return logger

    if logger_process_name is None:
        raise RuntimeError(f'process name not set, cannot create logger instance')

    logger = logging.getLogger(__name__)

    stdout_log_formatter = # ...

    # skip some lines

    file_log_handler = logging.FileHandler(filename=f'{logger_process_name}_{logger_file_datetime}.log')
    file_log_handler.setLevel(logging.DEBUG)
    file_log_handler.setFormatter(file_log_formatter)

    logger.setLevel(logging.DEBUG)
    logger.addHandler(stdout_log_handler)
    logger.addHandler(file_log_handler)
    
    return logger

However, to actually use this from one of the processes, and other modules, is a disaster. The ordering of import statements suddenly becomes important.

# process_1.py

from lib_logger import get_logger
from lib_logger import set_logger_process_name

# might defer process name definitions to another module
from lib_process_names import PROCESS_NAME_PROCESS_1

set_logger_process_name(PROCESS_NAME_PROCESS_1)
log = get_logger()

# must initialize the module, or at least call `set_logger_process_name`
# before initializing any further modules which use the logger module
# (call `get_logger`)
# Why? Because cannot call `get_logger` before `set_logger_process_name`
# has been called
from example_module import example_function

Just to show what example_function might do:

# example_module.py

from lib_logger import get_logger()

log = get_logger()

def example_function():
    log.info('started work in example_function...')

We could actually hack our way around this by having all functions call log = get_logger() before they start running, but this isn't a good solution. The log handle should be globally reachable within a module, each function shouldn't be responsible for obtaining a handle to a logger. That's not much improvement over passing around a logger instance as a function argument.

2 Answers 2

0

This sounds like as if Decorators would be very useful.

# lib_logger.py


import logging
import sys
from datetime import datetime, timezone
from functools import wraps
import inspect
import os

def create_logger(process_name: str) -> logging.Logger:
    logger = logging.getLogger(process_name)
    if not logger.hasHandlers():
        logger.setLevel(logging.DEBUG)
        formatter = logging.Formatter('%(name)s: %(asctime)s | %(levelname)s | %(process_name)s | %(message)s')

        stdout_handler = logging.StreamHandler(sys.stdout)
        stdout_handler.setLevel(logging.INFO)
        stdout_handler.setFormatter(formatter)

        file_handler = logging.FileHandler(f'{process_name}_{datetime.now(timezone.utc).date()}.log')
        file_handler.setLevel(logging.DEBUG)
        file_handler.setFormatter(formatter)

        logger.addHandler(stdout_handler)
        logger.addHandler(file_handler)

    # Add process_name to the logger's extra dictionary
    logger = logging.LoggerAdapter(logger, {"process_name": process_name})

    return logger

def logger(func=None, *, process_name=None):

    if func is None: # if decorator called with parentheses
        def decorator(func):
            return logger(func, process_name=process_name)
        return decorator
    else:
        if process_name is None:
            process_name = os.path.splitext(os.path.basename(inspect.getmodule(func).__file__))[0]
        log = create_logger(process_name)

        @wraps(func) # preserver metatdata of func
        def wrapper(*args, **kwargs):
            wrapper.log = log # attach logger to the wrapper function
            return func(*args, **kwargs)
        return wrapper

Then in process_1.py

# process_1.py

from lib_logger import logger

@logger
def main_task(param1, param2):
    log = main_task.log  # use attached logger within the decorated function
    log.info('Executing main_task...')
    return f"Processed {param1} and {param2}"

if __name__ == '__main__':
    result = main_task('value1', 'value2')
    print(result)

And process_2.py

from lib_logger import logger

@logger(process_name="custom_process_2")
def main_task(param1):
    log = main_task.log
    log.info("Executing main_task from process_2...")
    return f"ask completed with {param1}"

if __name__ == '__main__':
    result = main_task('valueX')
    print(result)

Here, just as an example we give "custom_process_2" manually. If left out, it will assess itself through inspection "process_2" for the decorated function in process_2.py. (in this case, you would just write a simple @logger over the definition of main_task(param1)).

then run these files with python:

$ python process_1.py
$ python process_2.py

It prints:

process_1: 2024-08-22 21:42:09,340 | INFO | process_1 | Executing main_task...
Processed value1 and value2

# and then:
custom_process_2: 2024-08-22 21:42:32,668 | INFO | custom_process_2 | Executing main_task from process_2...
ask completed with valueX

the decorator takes care of generation of the logger using the file name where the decorated function is (this gets assessed by the inspect.getmodule() method). and attaches it to the decorated function. So that we can - within the decorated function by

log = main_task.log 
# main_task should in each case be replaced by
# the decorated functions name

And then, within the decorated function we can from then on use log.info() etc. for logging.

The cool thing is that the decorator pre-initizializes automatically the logger for every function it decorates. if you give the same process_name manually, then those function will use the same logger.

If the process_names differ - they will appear in separated logging files. with the process_name as part of the file name.

After running all the commands, the folder will look like this:

.
├── __pycache__
│   └── lib_logger.cpython-310.pyc
├── custom_process_2_2024-08-22.log
├── lib_logger.py
├── process_1.py
├── process_1_2024-08-22.log
└── process_2.py

as you can see, there are two log files: custom_process_2_2024-08-22.log process_1_2024-08-22.log

Is this what you wanted?

Sign up to request clarification or add additional context in comments.

Comments

0

A simpler answer: I personally would pass loggers explicitely. For the sake of referential transparency - the most important principle of functional programming.

However, if you have to write hundreds/thousands of functions which need logging, then I understand that it is very tedious.

In this case, I would then use the decorator - which is explicit also to a certain degree explicit.

2 Comments

The problem with passing them explicitly is the amount of refactoring this necessitates. (Both now and in the future as code evolves.) I like your other answer but I need some time to understand it. I also understand your point about functional programming. You are right, but the pragmatic argument is that for something which is intrinsically universal, a logger should be something which a global reference can always be obtained with ease. It's a bit like print() statements. You don't expect to have to pass a reference to stdout as an argument to all functions.
that's true. - and yes, using decorators (which are actually nothing other than higher order functions (they take a function as an argument - and often a couple more arguments - and return a new decorated function) helps to separate the concerns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.