Discover source of stdout output - python-3.x

I'm using a third party lib that is printing \n a couple of hundred times to stdout - it appears to be a bug in their logging.
What is the best approach to discover what line of code is producing this output (I'm hoping it's python and not an external library)?
Some ideas:
How can I change the display of the \n character to something recognizable like A so that if I step through with a debugger, I can see when the output occurs?
Can I monkey-patch a low level function to cause some output or a debug-break everytime characters are sent to std-out?

You could try to overtake stdout:
class StdoutFilter:
def __init__(self, realStdout):
self.realStdout = realStdout
def write(self, text):
if text == '\n': # or some more complicated condition
raise Exception("Newline alert!")
self.realStdout.write(text)
import sys
sys.stdout = StdoutFilter(sys.stdout)
# import the 3rd party library and use it
In this filter class you can add a counter and raise the exception only after more than N subsequent newlines were printed. The stack trace of the exception will reveal the source of the malicious print.
Or you can put a break point there, or change the \n to something else.

Related

How can I use python logging to log bytes to one handler, and unicode to another?

I have a complex (but fairly standard) log setup in my python3 application. A single log function is used, and then various handlers distribute those logs to different places, such as console output, a log file, and a hardware serial port. This had been working fine in python2, but I'm converting to python3 and starting to hit an issue. The console and file output handlers are expecting unicode, but the StreamHandler that handles the serial output requires bytes encoding. I don't see any way to tell python to convert to bytes just for a single log handler.
Here is some partial code for reference:
serial_handler = logging.StreamHandler(serial)
serial_handler.setLevel(logging.DEBUG)
serial_handler.setFormatter(serial_formatter)
self._log.addHandler(serial_handler)
fh = logging.FileHandler(file_name)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
self._log.addHandler(fh)
Then self._log.debug("Hello, world!") could be called. But it's going to send unicode to the FileHandler, which is correct, but also to the StreamHandler, which will fail because it needs to be encoded to bytes after the logging.Formatter function has occurred, and I don't know how to make that happen.
Note serial in this case is an instance of the pySerial class. It's .write() function expects bytes.
I ended up writing a custom Handler which seems to be working. But I'm still curious to see proposed alternatives.
from logging import StreamHandler
class SerialHandler(StreamHandler):
def __init__(self, serial):
StreamHandler.__init__(self, serial)
def emit(self, record):
try:
msg = self.format(record)
stream = self.stream
stream.write(bytes(msg + self.terminator, 'utf-8'))
self.flush()
except RecursionError:
raise
except Exception:
self.handleError(record)

How do I pass through or wrap the print command(stdout) so that print also calls a function every call?

I am trying to automating a long running job, and I want to be able to upload all console outputs to another log like on CloudWatch Logs. For the most part this can be done by making and using a custom function instead of print. But there are functions in MachineLearning like Model.summary() or progress bars while training that outputs to stdout on their own.
I can get all get all console outputs at the very end, via an internal console log. But what I need is real-time uploading of stdout as its called by whomever. So that one can check the progress by taking a look at the logs on Cloudwatch instead of having to log into the machine and check the internal console logs.
Basically what I need is:
From: call_to_stdout -> Console(and probably other stuff)
To: call_to_stdout -> uploadLog() -> Console(and probably other stuff)
pseudocode of what I need
class stdout_PassThru:
def __init__(self, in_old_stdout):
self.old_stdout = in_old_stdout
def write(self, msg):
self.old_stdout.write(msg)
uploadLogToCloudwatch(msg)
def uploadLogToCloudwatch(msg):
# Botocore stuff to upload to Cloudwatch
myPassThru = stdout_PassThru(sys.stdout)
sys.stdout = myPassThru
I've tried googling this, but the best I ever get is stringIO stuff, where I can capture stdout, but I cannot do anything with it until the function I called ends and I can insert code again. I would like to run my upload Log code everytime stdout is used.
Is this even possible?
Please and thank you.
EDIT: Someone suggested redirect/output to file. The problem is that, that just streams/writes to the file as things are outputted. I need to call a function that does work on each call to stdout, which is not a stream. If stdout outputs everytime it flushes itself, then having the function call then would be good too.
I solved my problem. Sort of hidden in some other answers.
The initial problem I had with this solution is that when it is tested within a Jupyter Notebook, the sys.stdout = myClass(sys.stdout) causes Jupyter to... wait? Not sure but it never finishes processing the paragraph.
But when I put it into a python file and ran with python test.py it ran perfectly and as expected.
This allows me to in a sense pass thru calls to print, while executing my own function every call to print.
def addLog(message):
# my boto function to upload Cloudwatch logs
class sendToLog:
def __init__(self, stream):
self.stream = stream
def write(self, o):
self.stream.write(o)
addLog(o)
self.stream.flush()
def writelines(self, o):
self.stream.writelines(o)
addLog(o)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
sys.stdout = sendToLog(sys.stdout)

How to redirect the stdout of a multiprocessing.Process

I'm using Python 3.7.4 and I have created two functions, the first one executes a callable using multiprocessing.Process and the second one just prints "Hello World". Everything seems to work fine until I try redirecting the stdout, doing so prevents me from getting any printed values during the process execution. I have simplified the example to the maximum and this is the current code I have of the problem.
These are my functions:
import io
import multiprocessing
from contextlib import redirect_stdout
def call_function(func: callable):
queue = multiprocessing.Queue()
process = multiprocessing.Process(target=lambda:queue.put(func()))
process.start()
while True:
if not queue.empty():
return queue.get()
def print_hello_world():
print("Hello World")
This works:
call_function(print_hello_world)
The previous code works and successfully prints "Hello World"
This does not work:
with redirect_stdout(io.StringIO()) as out:
call_function(print_hello_world)
print(out.getvalue())
With the previous code I do not get anything printed in the console.
Any suggestion would be very much appreciated. I have been able to narrow the problem to this point and I think is related to the process ending after the io.StringIO() is already closed but I have no idea how to test my hypothesis and even less how to implement a solution.
This is the workaround I found. It seems that if I use a file instead of a StringIO object I can get the things to work.
with open("./tmp_stdout.txt", "w") as tmp_stdout_file:
with redirect_stdout(tmp_stdout_file):
call_function(print_hello_world)
stdout_str = ""
for line in tmp_stdout_file.readlines():
stdout_str += line
stdout_str = stdout_str.strip()
print(stdout_str) # This variable will have the captured stdout of the process
Another thing that might be important to know is that the multiprocessing library buffers the stdout, meaning that the prints only get displayed after the function has executed/failed, to solve this you can force the stdout to flush when needed within the function that is being called, in this case, would be inside print_hello_world (I actually had to do this for a daemon process that needed to be terminated if it ran for more than a specified time)
sys.stdout.flush() # This will force the stdout to be printed

Output data from subprocess command line by line

I am trying to read a large data file (= millions of rows, in a very specific format) using a pre-built (in C) routine. I want to then yeild the results of this, line by line, via a generator function.
I can read the file OK, but where as just running:
<command> <filename>
directly in linux will print the results line by line as it finds them, I've had no luck trying to replicate this within my generator function. It seems to output the entire lot as a single string that I need to split on newline, and of course then everything needs reading before I can yield line 1.
This code will read the file, no problem:
import subprocess
import config
file_cmd = '<command> <filename>'
for rec in (subprocess.check_output([file_cmd], shell=True).decode(config.ENCODING).split('\n')):
yield rec
(ENCODING is set in config.py to iso-8859-1 - it's a Swedish site)
The code I have works, in that it gives me the data, but in doing so, it tries to hold the whole lot in memory. I have larger files than this to process which are likely to blow the available memory, so this isn't an option.
I've played around with bufsize on Popen, but not had any success (and also, I can't decode or split after the Popen, though I guess the fact I need to split right now is actually my problem!).
I think I have this working now, so will answer my own question in the event somebody else is looking for this later ...
proc = subprocess.Popen(shlex.split(file_cmd), stdout=subprocess.PIPE)
while True:
output = proc.stdout.readline()
if output == b'' and proc.poll() is not None:
break
if output:
yield output.decode(config.ENCODING).strip()

Overriding file.write in python 3

I'm aware of the SO post How do I override file.write() in Python 3? but after looking it over and trying whats suggested I'm still stuck.
I want to override the file.write method in Python 3 so that I can "REDACT" certain words (Usernames, Passwords...etc).
I found a great example of overriding the print and general stdout and stderr http://code.activestate.com/recipes/119404/
The issue is that it doesn't work for file.write. How can I override the file.write?
My code for redacting when printing is:
def write(self, text):
for word in self.redacted_list:
text = text.replace(word, "REDACTED")
self.origOut.write(text)
return text
thanks
From the self.origOut.write(text) I assume you are trying to write an in-between-class that pretends to be a file but provides a different .write() method.
I don't see any problems in the code you posted (assuming it's a method of a class you use). Possibly you wrote a class but forgot to create instances of it?
Did you try to write something like this?:
class IAmNoARealFile:
def __init__(self, real_file):
self.origOut = real_file
def __getattr__(self, attr_name): # provide everything a file has
return getattr(self.origOut, attr_name)
def write(self, ...):
...
with open('test.txt', 'w') as f:
f = IAmNotARealFile(f) # did you forget this?
f.write('some text SECRET blah SECRET') # calls IAMNotARealFile.write with your extra code
with open('test.txt') as f:
f = IAmNotARealFile(f)
print(f.read()) # this "falls through" to the actual file object
you will also probably want to return self.origOut.write() in your own .write(), if you don't have a specific reason not to.
Note that if you rewrite open() to directly return IAMNotARealFile:
def open(*args, **kwargs):
return IAMNotARealFile(open(*args, **kwargs))
you will have to manually supply (some) "magic methods" because
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup.
--docs for .__getattribute__(), but it also applies to .__getattr__()
Why?
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
-- On special ("magic") method lookup [code style and emphasis mine]

Resources