0

Apologies if this question is similar to others posted on SO, but I have tried many of the answers given and could not achieve what I am attempting to do.

I have some code that calls an external module:

import trafilatura

# after obtaining article_html
text = trafilatura.extract(article_html, language=en)

This sometimes prints out a warning on the console, which comes from the following code in the trafilatura module:

# at the top of the file
LOGGER = logging.getLogger(__name__)

# in the method that I'm calling
LOGGER.warning('HTML lang detection failed')

I'd like to not print this and other messages produced by the module directly to the console, but to store them somewhere such that I can edit the messages and decide what to do with them. (Specifically, I want to save the messages in slightly modified form but only given certain circumstances.) I am not using the logging library in my own code.

I have tried the following solution suggestions:

buf = io.StringIO()
with contextlib.redirect_stderr(buf):  # I also tried redirect_stdout
    text = trafilatura.extract(article_html, language=en)

and

buf = io.StringIO()    
sysout = sys.stdout
syserr = sys.stderr
sys.stdout = sys.stderr = buf
text = trafilatura.extract(article_html, language=en)
sys.stdout = sysout
sys.stderr = syserr

However, in both cases buf remains empty and trafilatura still prints its logging messages to the console. Testing the redirects above with other calls (e.g. print("test")) they seem to catch those just fine, so apparently LOGGER.warning() from trafilatura is just not printing to stderr or stdout?

I thought I could set a different output stream target for trafilatura's LOGGER, but it is using a NullHandler so I could neither figure out its stream target nor do I know how to change it:

# from trafilatura's top-level __init__.py
logging.getLogger(__name__).addHandler(NullHandler())

Any ideas? Thanks in advance.

1 Answer 1

1

The idea here is to work within the standard logging lib of python. Adding a NullHandler is actually standard recommended practice for libraries that add a logger because it prevents falling back to stderr if no logging configuration is present.

What is likely happening here is that those logs are propagating to the root logger which got some handler attached somewhere else. You can stop that by getting the logger of the module in your code and setting it to not propagate:

# assuming that "trafilatura" is the __name__ of the module:
logger = logging.getLogger("trafilatura")
logger.propagate = False
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! Setting propagate to false and adding another stream handler to the logger for further processing of the outputs achieves exactly what I need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.