Papermill prints everything on the console - python-3.x

I am working on adding new features to project. Within the project I am logging stdout to a file because some components are printing information useful for debugging. I recently added a new feature to the project which uses papermill to run jupyter notebook. The problem I am having is that papermill is printing everything to the console even if I redirect stdout to a temporary variable.
Below you can see a sample code,
with io.StringIO() as buf, redirect_stdout(buf):
pm.execute_notebook(
path,
path,
progress_bar=False,
stdout_file=buf,
stderr_file=buf,
parameters=dict(**params)
)
print("!!! redirected !!!")
print("!!! redirected !!!")
The first print statement successfully gets redirected to the buf while everything pm.execute_notebook prints goes to the console. The last print statement prints on the console as expected.

To solve the problem I had to change the handler and logging level of the logger.
To get the logger:
logger = logging.getLogger('papermill')
To change the logging level:
logger.setLevel('WARNING')
To remove the stream handler:
logger.removeHandler(logging.StreamHandler())
Removing the stream handler and setting the right level solved my problem. Here is a link to Python logging documentation.

Related

How to initiate a logfile and change handler.suffix in a TimedRotatingFileHandler

I have a script that monitors a platforms chat, and logs it to a file. Here is the code to setup my logger:
import logging
import logging.handlers as handlers
import time
from datetime import datetime
chatlogger = logging.getLogger("chatlog")
chatlogger.setLevel(logging.INFO)
logHandler = handlers.TimedRotatingFileHandler('chatlog_', when='midnight', interval =1, encoding='utf-8')
logHandler.setLevel(logging.INFO)
logHandler.suffix="%Y%m%d.log"
chatlogger.addHandler(logHandler)
logHandler.doRollover() #this line is needed if when=midnight, otherwise it does not crate the proper file
This works, the chatlog_.yyyymmddd.log gets created and it rolls over when it should. However, there are two small issues I'd like to address/address differently.
The first is that the very first log file the script creates does not have the suffix; it is just 'chatlog_' and nothing else. I added in the doRollover() to correct this, is a different or better way to handle initiating the logfile? The script will be run 24/7(*or as close to that as possible), being restarted with the machine.
The second issue is more of an aesthetic thing. The logHandler.suffix() adds in a '.' between the filename and suffix. Is there something I can do to so stop that from happening?

Creating new files on heroku while app is working?

I have a python telegram bot and I have deployed it on Heroku. But problem is that my program actually creates pickled files while working. I have one database which is saving the required data and pickles to save some nested classes which I have to use later at some stage. So these pickle files are one of the important parts of the program. I am using the dill module for pickling.
I was able to save these files locally, but can't do when I am doing it in Heroku. I'll share the logs below. It is not even reaching the pickling part, but giving an error in opening the file itself.
import dill
def saving_test(test_path, test_obj):
try:
save_test_logger.info('Saving test...')
try:
save_test_logger.info('opening file')
test_file = open(test_path, 'wb')
except Exception as exc:
save_test_logger.exception('Error opening file')
return 0
dill.dump(test_obj, test_file)
save_test_logger.debug(f'file saved in {test_path}')
test_file.close()
return 1
except Exception as exc:
save_test_logger.exception('saving error')
test_file.close()
return exc
saving 859ab1303bcd4a65805e364a989ac8ca
2020-10-07T20:53:18.064670+00:00 app[web.1]: Could not open file ./test_objs/859ab1303bcd4a65805e364a989ac8ca.pkl
And I have added logging also to my program, but now I am confused about where can I see the original logs which are supposed to catch the exceptions also.
This is my first time using Heroku and I am comparatively new to programming also. So please help me out here to identify the route cause of the problem.
I found the problem. Even though I have pushed everything, directory test_objs wasn't there on Heroku. I just added 2 more lines of code to make a directory using os module if the directory does not exist. That solved the problem. I am not deleting the question so that just in case someone gets stuck or confused in a similar kind of situation, this question might be able to help them.

How do I pass through or wrap the print command(stdout) so that print also calls a function every call?

I am trying to automating a long running job, and I want to be able to upload all console outputs to another log like on CloudWatch Logs. For the most part this can be done by making and using a custom function instead of print. But there are functions in MachineLearning like Model.summary() or progress bars while training that outputs to stdout on their own.
I can get all get all console outputs at the very end, via an internal console log. But what I need is real-time uploading of stdout as its called by whomever. So that one can check the progress by taking a look at the logs on Cloudwatch instead of having to log into the machine and check the internal console logs.
Basically what I need is:
From: call_to_stdout -> Console(and probably other stuff)
To: call_to_stdout -> uploadLog() -> Console(and probably other stuff)
pseudocode of what I need
class stdout_PassThru:
def __init__(self, in_old_stdout):
self.old_stdout = in_old_stdout
def write(self, msg):
self.old_stdout.write(msg)
uploadLogToCloudwatch(msg)
def uploadLogToCloudwatch(msg):
# Botocore stuff to upload to Cloudwatch
myPassThru = stdout_PassThru(sys.stdout)
sys.stdout = myPassThru
I've tried googling this, but the best I ever get is stringIO stuff, where I can capture stdout, but I cannot do anything with it until the function I called ends and I can insert code again. I would like to run my upload Log code everytime stdout is used.
Is this even possible?
Please and thank you.
EDIT: Someone suggested redirect/output to file. The problem is that, that just streams/writes to the file as things are outputted. I need to call a function that does work on each call to stdout, which is not a stream. If stdout outputs everytime it flushes itself, then having the function call then would be good too.
I solved my problem. Sort of hidden in some other answers.
The initial problem I had with this solution is that when it is tested within a Jupyter Notebook, the sys.stdout = myClass(sys.stdout) causes Jupyter to... wait? Not sure but it never finishes processing the paragraph.
But when I put it into a python file and ran with python test.py it ran perfectly and as expected.
This allows me to in a sense pass thru calls to print, while executing my own function every call to print.
def addLog(message):
# my boto function to upload Cloudwatch logs
class sendToLog:
def __init__(self, stream):
self.stream = stream
def write(self, o):
self.stream.write(o)
addLog(o)
self.stream.flush()
def writelines(self, o):
self.stream.writelines(o)
addLog(o)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
sys.stdout = sendToLog(sys.stdout)

OpenCV(imread) operation stuck in elastic beanstalk

I'm trying to read a png file and output the numpy matrix of the image in terminal using imread function of opencv on the server like this
import cv2
from flask import Flask
import os
#application.route('/readImage',methods=['POST'])
def handleHTTPPostRequest():
imagePath = f'{os.getcwd()}/input.png'
print('image path is', imagePath)
print(cv2.__version__)
im = cv2.imread(imagePath,cv2.IMREAD_COLOR)
print(im)
return 'success'
This is giving expected output on my local machine(Ubuntu 18.04) no matter howmany times I execute it. I moved this to elastic beanstalk(CentOS) with necessary setup. The request runs fine(gives proper logs along with success) the very first time I make a post call.
But when I make the post call second time, it's only outputting first two logs(imagepath and cv2 version) and is stuck there for a while. and after sometime, it's showing this error
End of script output before headers: application.py
I have added one more line just before cv2.imread just to make sure that the file exists
print('does the file exists',os.path.isfile(imagePath) )
This is returning true everytime. I have restarted the server multiple times, looks like it only works the very first time and cv2.imread() is stuck after the first post call.What am I missing
When you print from a request handler, Flask tries to do something sensible, but print really isn't what you want to be doing, as it risks throwing the HTTP request/response bookkeeping off.
A fully-supported way of getting diagnostic info out of a handler is to use the logging module. It will require a small bit of configuration. See http://flask.pocoo.org/docs/1.0/logging/
To anyone facing this issue, I have found a solution. Add this to your ebextensions config file
container_commands:
AddGlobalWSGIGroupAccess:
command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"
Saikiran's final solution worked for me. I was getting this issue when I tried calling methods from the opencv-python library. I'm running Ubuntu 18.04 locally and it works fine there. However, like Saikiran's original post, when deployed to Elastic Beanstalk the first request works and then the second one does not. For my EB environment, I'm using a Python3.6-based Amazon Linux server.

Handling logs and writing to a file in python?

I have a module name acms and inside that have number of python files.The main.py has calls to other python files.I have added logs in those files, which are displayed on console but i also want to write these logs in a file called all.log, i tried with setting log levels and logger in a file called log.py but didnt get the expected format,since am new to python am getting difficulty in handling logs
Use the logging module and use logger = logging.getLogger(__name__). Then it will use the correct logger with the options that you have set up.
See the thinkpad-scripts project for its logging. Also the logging cookbook has a section for logging to multiple locations.
We use the following to log to the console and the syslog:
kwargs = {}
dev_log = '/dev/log'
if os.path.exists(dev_log):
kwargs['address'] = dev_log
syslog = logging.handlers.SysLogHandler(**kwargs)
syslog.setLevel(logging.DEBUG)
formatter = logging.Formatter(syslog_format)
syslog.setFormatter(formatter)
logging.getLogger('').addHandler(syslog)

Resources