How do I change the line endings used by PExpect output - python-3.x

The returned output from pexpect.run() includes \r\n at the end of every line. Printing to the terminal using print(returnVal.decode()) correctly prints one line for each line returned. When I examine the output I see that the byte string contains \r\n. When I log that to a file I get double returns to the log file. I'm on a Mac using Python 3.7. Is there a way to set the preferred new line when writing the output? I am using pythons logging class and using the info() method to write the string. Output looks like this:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
When it should look like:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
Here is a simplified version of my original Logger class:
class Logger():
def __init__( self, path ):
msgFormat = '%(asctime)s.%(msecs)d\t%(message)s'
dateFormat = '%m/%d/%Y %H:%M:%S'
logging.basicConfig( format=msgFormat, datefmt=dateFormat, filename=path, level=logging.INFO )
def Log ( self, theStr ):
logging.info( str( theStr ))
The string being returned from Pexpect looks something like:
Line1\r\nLine2

Depending on how you log the output, it's advisable to format the newlines before sending to logger. However, if you must override the logging module's newline parameter for FileHandler, and as an experiment, you can do so by monkey patching its _open method as the functionality isn't available by default.
I used source code for Python version 3.8 to get _open function's definition.
import logging
def custom_open(self):
"""
Monkey patched _open function of class logging.FileHandler (Python 3.8)
"""
return open(self.baseFilename, self.mode, encoding=self.encoding, newline='')
logging.FileHandler._open = custom_open
if __name__ == "__main__":
pexpect_return = "Output\nTest"
my_log = logging.getLogger("test_logger")
my_log.setLevel(logging.INFO)
my_log.addHandler(logging.FileHandler("test.log"))
my_log.info(pexpect_return)
How it works
Python's logging module has a class FileHandler, which uses a method _open to create a file handler object to write and append to log files on disk. Its default implementation as of version 3.8 does not have the newline parameter so it uses default newlines.
Monkey patching is when you replace or update a method/function in one of your imported classes, as the program is running. This line logging.FileHandler._open = custom_open tells python to replace the _open method of the FileHandler class, with my custom_open method. Then later when I use my_log.addHandler(logging.FileHandler("test.log")), the new custom_open method is used to open the file with newline paramater.
You can further confirm that the new method is used to open the file by adding a suffix to the file name like this:
return open(self.baseFilename+"__Monkey_Patched", self.mode, encoding=self.encoding, newline='')
If you will now run that demo code, the filename will be "test.log__Monkey_Patched".
This code, however, will not replace any newline characters which you pass to the logger as part of the string to log. You need to process that beforehand.

Related

Why custom log level not showing the correct filename (%(filename)s) in the log file in python

Added new log level verbose to the logger module. It is working, but the filename which gets logged along with it is not the correct one. It is always showing the logger module name instead of the module from where the log is getting triggered.
Below is the logger module code,
class _CustomRotatingFileHandler(logging.handlers.RotatingFileHandler):
def __init__(self, *args, **kwargs):
prev_umask = os.umask(0o000)
super().__init__(*args, **kwargs)
os.umask(prev_umask)
def doRollover(self):
prev_umask = os.umask(0o000)
super().doRollover()
os.umask(prev_umask)
def setup_logging():
reference = 'my_project'
date_fmt = '%(asctime)s %(levelname)s %(filename)s:%(lineno)d' \
' : %(message)s'
# Defining a custom log level for verbose log level
# selected a value between logging.INFO and logging.DEBUG
log_level_verbose = 15
logging.addLevelName(log_level_verbose, 'VERBOSE')
logger = logging.getLogger(reference)
logger.setLevel(logging.DEBUG)
# For custom log level verbose
log_func = lambda msg, *args, **kwargs: logger.log(log_level_verbose, msg, *args, **kwargs) # noqa: E731
setattr(logger, 'verbose', log_func)
file_handler = _CustomRotatingFileHandler(
log_file, maxBytes=10485760, backupCount=3
)
file_log_format = logging.Formatter(date_fmt, "%b %e %H:%M:%S")
file_handler.setFormatter(file_log_format)
file_handler.setLevel(log_level_verbose)
console_handler = logging.StreamHandler(sys.stdout)
console_log_format = logging.Formatter("%(message)s")
console_handler.setFormatter(console_log_format)
console_handler.setLevel(logging.INFO)
logger.handlers = [file_handler, console_handler]
return logger
The verbose level created here is to log message only to the log file and not to console.
The flow is, main.py creates the logger object by calling setup_logging and this logger object passed to test_module.py function for execution and logging.
The log file content is something like below where VERBOSE message is getting logged only in the file (as expected) but the filename logged along with it is wrong, it should be test_module.py
Feb 4 16:54:31 INFO main.py:151 : ----------------------
Feb 4 16:54:31 INFO main.py:152 : Executing command for 'task1'
Feb 4 16:54:31 INFO main.py:153 : ----------------------
Feb 4 16:54:31 INFO test_module.py:34 : Executing command 'ls -ltr'
Feb 4 16:54:31 VERBOSE log.py:88 : Some verbose log message
Feb 4 16:54:32 INFO test_module.py:37 : Result: total 24
-rwxr-xr-x. 1 root root 20606 Jan 13 13:31 main.py
Feb 4 16:54:32 INFO main.py:160 : Completed
Currently for the VERBOSE log, I'm getting the log message as below,
Feb 4 16:54:31 VERBOSE log.py:88 : Some verbose log message
The expected log message with correct filename should be as below,
Feb 4 16:54:31 VERBOSE test_module.py:35 : Some verbose log message
Finally I got the answer. Here is the line of the code that got modified,
# For custom log level verbose
log_func = lambda msg, *args, **kwargs: logger._log(log_level_verbose, msg, args, **kwargs) # noqa: E731
Changed from logger.log to logger._log and argument *args changed to args.
I found the precise reason behind this in one of the comments for an answer in How to add a custom loglevel to Python's logging facility by #rivy. Below is the explanation,
Using _log() instead of log() is needed to avoid introducing an extra level in the call stack. If log() is used, the introduction of the extra stack frame causes several LogRecord attributes (funcName, lineno, filename, pathname, ...) to point at the debug function instead of the actual caller.
And this is exactly the reason why filename was not working correctly with logger.log

SED style Multi address in Python?

I have an app that parses multiple Cisco show tech files. These files contain the output of multiple router commands in a structured way, let me show you an snippet of a show tech output:
`show clock`
20:20:50.771 UTC Wed Sep 07 2022
Time source is NTP
`show callhome`
callhome disabled
Callhome Information:
<SNIPET>
`show module`
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
2 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
3 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
4 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
21 0 Fabric Module N9K-C9504-FM-R ok
22 0 Fabric Module N9K-C9504-FM-R ok
23 0 Fabric Module N9K-C9504-FM-R ok
<SNIPET>
My app currently uses both SED and Python scripts to parse these files. I use SED to parse the show tech file looking for a specific command output, once I find it, I stop SED. This way I don't need to read all the file (these can get to be very big files). This is a snipet of my SED script:
sed -E -n '/`show running-config`|`show running`|`show running config`/{
p
:loop
n
p
/`show/q
b loop
}' $1/$file
As you can see I am using a multi address range in SED. My question specifically is, how can I achieve something similar in python? I have tried multiple combinations of flags: DOTALL and MULTILINE but I can't get the result I'm expecting, for example, I can get a match for the command I'm looking for, but python regex wont stop until the end of the file after the first match.
I am looking for something like this
sed -n '/`show clock`/,/`show/p'
I would like the regex match to stop parsing the file and print the results, immediately after seeing `show again , hope that makes sense and thank you all for reading me and for your help
You can use nested loops.
import re
def process_file(filename):
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
print(line)
for line1 in f:
print(line1)
if re.search(r'`show', line1):
return
The inner for loop will start from the next line after the one processed by the outer loop.
You can also do it with a single loop using a flag variable.
import re
def process_file(filename):
in_show = False
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
in_show = True
if in_show
print(line)
if re.search(r'`show', line1):
return

How to write Chinese characters to file based on unicode code point in Python3

I am trying to write Chinese characters to a CSV file based on their Unicode code points found in a text file in unicode.org/Public/zipped/13.0.0/Unihan.zip. For instance, one example character is U+9109.
In the example below I can get the correct output by hard coding the value (line 8), but keep getting it wrong with every permutation I've tried at generating the bytes from the code point (lines 14-16).
I'm running this in Python 3.8.3 on a Debian-based Linux distro.
Minimal working (broken) example:
1 #!/usr/bin/env python3
2
3 def main():
4
5 output = open("test.csv", "wb")
6
7 # Hardcoded values work just fine
8 output.write('\u9109'.encode("utf-8"))
9
10 # Comma separation
11 output.write(','.encode("utf-8"))
12
13 # Problem is here
14 codepoint = '9109'
15 u_str = '\\' + 'u' + codepoint
16 output.write(u_str.encode("utf-8"))
17
18 # End with newline
19 output.write('\n'.encode("utf-8"))
20
21 output.close()
22
23 if __name__ == "__main__":
24 main()
Executing and viewing results:
example $
example $./test.py
example $
example $cat test.csv
鄉,\u9109
example $
The expected output would look like this (Chinese character occurring on both sides of the comma):
example $
example $./test.py
example $cat test.csv
鄉,鄉
example $
chr is used to convert integers to code points in Python 3. Your code could use:
output.write(chr(0x9109).encode("utf-8"))
But if you specify the encoding in the open instead of using binary mode you don't have to manually encode everything. print to a file handles newlines for you as well.
with open("test.txt",'w',encoding='utf-8') as output:
for i in range(0x4e00,0x4e10):
print(f'U+{i:04X} {chr(i)}',file=output)
Output:
U+4E00 一
U+4E01 丁
U+4E02 丂
U+4E03 七
U+4E04 丄
U+4E05 丅
U+4E06 丆
U+4E07 万
U+4E08 丈
U+4E09 三
U+4E0A 上
U+4E0B 下
U+4E0C 丌
U+4E0D 不
U+4E0E 与
U+4E0F 丏

Python PATH .is_file() evaluates symlink as a file

In my Python3 program, I take a bunch of paths and do things based on what they are. When I evaluate the following symlinks (snippet):
lrwxrwxrwx 1 513 513 5 Aug 19 10:56 console -> ttyS0
lrwxrwxrwx 1 513 513 11 Aug 19 10:56 core -> /proc/kcore
lrwxrwxrwx 1 513 513 13 Aug 19 10:56 fd -> /proc/self/fd
the results are:
symlink console -> ttyS0
file core -> /proc/kcore
symlink console -> ttyS0
It evaluates core as if it were a file (vs a symlink). What is the best way for me to evaluate it as a symlink vs a file? code below
#!/usr/bin/python3
import sys
import os
from pathlib import Path
def filetype(filein):
print(filein)
if Path(filein).is_file():
return "file"
if Path(filein).is_symlink():
return "symlink"
else:
return "doesn't match anything"
if __name__ == "__main__":
file = sys.argv[1]
print(str(file))
print(filetype(file))
The result of is_file is intended to answer the question "if I open this name, will I open a file". For a symlink, the answer is "yes" if the target is a file, hence the return value.
If you want to know if the name is a symlink, ask is_symlink.

Python-Celery Log

I use flask, python 3.x and celery4 (Total 8 workers)
I want to make log files with 'RotatingFileHandler' to split if file size is over.
It works fine at first log file. (It includes all workers log, PoolWorker-1 ~ PoolWorker-8)
-rw-rw-r-- 1 sj sj 1048530 9월 18 10:01 celery_20170918.log (All worker's log)
But when file size is over, the worker write logs on seperated files.
-rw-rw-r-- 1 sj sj 223125 9월 18 10:47 celery_20170918.log (All worker's log except below 2, 5, 6))
-rw-rw-r-- 1 sj sj 43785 9월 18 10:47 celery_20170918.log.1 (only PoolWorker-2 log)
-rw-rw-r-- 1 sj sj 46095 9월 18 10:47 celery_20170918.log.2 (only PoolWorker-5 log)
-rw-rw-r-- 1 sj sj 45990 9월 18 10:47 celery_20170918.log.3 (only PoolWorker-6 log)
-rw-rw-r-- 1 sj sj 1048530 9월 18 10:01 celery_20170918.log.4 (Log file made at first is changed to this.)
I don't know what is the rule and they have any duplicated logs..!!!
My celery logger is as below.
tasks.py
logger = get_task_logger('tasks')
logger.setLevel("INFO")
filename = './log/celery/celery_task.log'
formatter = Formatter('%(levelname)s-%(asctime)s %(processName)s %(funcName)s():%(lineno)d %(message)s')
# FileSize rotating
fileMaxByte = 1024 * 1024 * 1 # 30MB
fileHandler = logging.handlers.RotatingFileHandler(filename, maxBytes=fileMaxByte, backupCount=100)
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
#celery.task(...options...)
def test_call(self):
logger.info("LOG TEST")
test.py
if __name__ == '__main__':
test_call.apply_async()
What's wrong?
RotatingFileHandler don't maintain atomicity between multiprocess when do log file rollover.
In multi processes environment, process A see maxBytes reached on log file c.log, do a file rename to c.log.1, then write some log lines to newly created c.log.
But at the same time, another process B may still holding a handle of c.log, because the way used to check file size, is based on the end offset of file handle, it also sees maxBytes reached, want to do file roll over on its own, because the way used to roll over, is rename file on disk, this process try to rename c.log to c.log.1, but as c.log.1 exists, renamed to c.log.2.
As there are other processes, c.log.3 get created so on and so forth.
This issue is well addressed by using external logging mechanism, or you could wrap up your own atomic file rotating logging handler.

Resources