1

Following the question here: How do I log from my Python Spark script, I have been struggling to get:

a) All output into a log file. b) Writing out to a log file from pyspark

For a) I use the following changes to the config file:

# Set everything to be logged to the console
log4j.rootCategory=ALL, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/home/xxx/spark-1.6.1/logging.log
log4j.appender.file.MaxFileSize=5000MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

This produces output and now for b) I would like to add my own input to logging from pyspark, but I cannot find any output written to the logs. Here is the code I am using:

import logging
logger = logging.getLogger('py4j')
#print(logger.handlers)
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
logger.info("TESTING.....")

I can find output in the logfile, but no "TESTING...." I have also tried using the existing logger stream but this does not work either.

import logging
logger = logging.getLogger('py4j')
logger.info("TESTING.....")

2 Answers 2

3

Works in my configuration:

log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("Hello logger...")
Sign up to request clarification or add additional context in comments.

Comments

-1

All output into a log file & Writing out to a log file from pyspark

import os
import sys
import logging
import logging.handlers

log = logging.getLogger(__name_)

handler = logging.FileHandler("spam.log")
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
log.addHandler(handler)
sys.stderr.write = log.error 
sys.stdout.write = log.info 

(will log every error in "spam.log" in the same directory, nothing will be on console/stdout)

(will log every info in "spam.log" in the same directory,nothing will be on console/stdout)

to print output error/info in both file as well as in console remove above two line.

Happy Coding Cheers!!!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.