python argparse ignore other options when a specific option is used - python-3.x

I am writing a python program that I want to have a command line interface that behaves in a particular way
The command line interface should accept the following invocations:
my_prog test.svg foo
my_prog --font=Sans test.svg foo
(it will generate an svg with the word foo written in the specified or default font)
Now I want to be able to also have this command accept the following invocation...
my_prog --list-fonts
which will list all of the valid options to --font as determined by the fonts available on the system.
I am using argparse, and I have something like this:
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', action='store_true')
args = parser.parse_args()
however this does not make the --list-fonts option behave as I would like as the two positional arguments are still required.
I have also tried using subparsers, but these still need a workaround to prevent the other options being required every time.
How do I get the desired behaviour with argparse.

argparse allows you to define arbitrary actions to take when encountering an argument, based on the action keyword argument to add_argument (see the docs)
You can define an action to list your fonts and then abort argument parsing, which will avoid checking for the other required arguments.
this could look like this:
class ListFonts(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
print("list of fonts here")
parser.exit() # exits the program with no more arg parsing and checking
Then you can add it to your argument like so:
parser.add_argument('--list-fonts', nargs=0, action=ListFonts)
Note nargs=0 has been added so that this argument doesn't require a value (the code in the question achieved this with action='store_true')
This solution has a side-effect of enabling the invocations like the following to also list the fonts and exits without running the main program:
my_prog --font Sans test.svg text --list-fonts
This is likely not a problem as it's not a typical use case, especially if the help text explains this behaviour.
If defining a new class for each such option feels too heavyweight, or perhaps you have more than one option that has this behaviour, then you could consider having a function that implements the desired action for each argument and then have a kind of factory function that returns a class that wraps the function. A complete example of this is shown below.
def list_fonts():
print("list of fonts here")
def override(func):
""" returns an argparse action that stops parsing and calls a function
whenever a particular argument is encountered. The program is then exited """
class OverrideAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
func()
parser.exit()
return OverrideAction
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', nargs=0, action=override(list_fonts),
help='list the font options then stop, don\'t generate output')
args = parser.parse_args()

Related

Skip processing fenced code blocks when processing Markdown files line by line

I'm a very inexperienced Python coder so it's quite possible that I'm approaching this particular problem in completely the wrong way but I'd appreciate any suggestions/help.
I have a Python script that goes through a Markdown file line by line and rewrites [[wikilinks]] as standard Markdown [wikilink](wikilink) style links. I'm doing this using two regexes in one function as shown below:
def modify_links(file_obj):
"""
Function will parse file contents (opened in utf-8 mode) and modify standalone [[wikilinks]] and in-line
[[wikilinks]](wikilinks) into traditional Markdown link syntax.
:param file_obj: Path to file
:return: List object containing modified text. Newlines will be returned as '\n' strings.
"""
file = file_obj
linelist = []
logging.debug("Going to open file %s for processing now.", file)
try:
with open(file, encoding="utf8") as infile:
for line in infile:
linelist.append(re.sub(r"(\[\[)((?<=\[\[).*(?=\]\]))(\]\])(?!\()", r"[\2](\2.md)", line))
# Finds references that are in style [[foo]] only by excluding links in style [[foo]](bar).
# Capture group $2 returns just foo
linelist_final = [re.sub(r"(\[\[)((?<=\[\[)\d+(?=\]\]))(\]\])(\()((?!=\().*(?=\)))(\))",
r"[\2](\2 \5.md)", line) for line in linelist]
# Finds only references in style [[foo]](bar). Capture group $2 returns foo and capture group $5
# returns bar
except EnvironmentError:
logging.exception("Unable to open file %s for reading", file)
logging.debug("Finished processing file %s", file)
return linelist_final
This works fine for most Markdown files. However, I can occasionally get a Markdown file that has [[wikilinks]] within fenced code blocks such as the following:
# Reference
Here is a reference to “the Reactome Project” using smart quotes.
Here is an image: ![](./images/Screenshot.png)
[[201802150808]](Product discovery)
```
[[201802150808 Product Prioritization]]
def foo():
print("bar")
```
In the above case I should skip processing the [[201802150808 Product Prioritization]] inside the fenced code block. I have a regex that identifies the fenced code block correctly namely:
(?<=```)(.*?)(?=```)
However, since the existing function is running line by line, I have not been able to figure out a way to skip the entire section in the for loop. How do I go about doing this?
You need to use a full Markdown parser to be able to cover all of the edge cases. Of course, most Markdown parsers convert Markdown directly to HTML. However, a few will use a two step process where step one converts the raw text to an Abstract Syntax Tree (AST) and step two renders the AST to the output format. It is not uncommon to find a Markdown renderer (outputs Markdown) which can replace the default HTML renderer.
You would simply need to modify either the parser step (using a plugin to add support for the wikilink syntax) or modify the AST directly. Then pass the AST to a Markdown renderer, which will give you a nicely formatted and normalized Markdown document. If you are looking for a Python solution, mistunePandoc Filters might be a good place to start.
But why go through all that when a few well crafted regular expressions can be run on the source text? Because Markdown parsing is complicated. I know, it seems easy at first. After all Markdown is easy to read for a human (which was one of its defining design goals). However, parsing is actually very complicated with parts of the parser reliant on previous steps.
For example, in addition to fenced code blocks, what about indented code blocks? But you can't just check for indentation at the beginning of a line, because a single line of a nested list could look identical to an indented code block. You want to skip the code block, but not the paragraph nested in a list. And what if your wikilink is broken across two lines? Generally when parsing inline markup, Markdown parsers will treat a single line break no different than a space. The point of all of this is that before you can start parsing inline elements, the entire document needs to first be parsed into its various block-level elements. Only then can you step through those and parse inline elements like links.
I'm sure there are other edge cases I haven't thought of. The only way to cover them all is to use a full-fledged Markdown parser.
I was able to create a reasonably complete solution to this problem by making a few changes to my original function, namely:
Replace the python re built-in with the regex module available on PyPi.
Change the function to read the entire file into a single variable instead of reading it line by line.
The revised function is as follows:
import regex
def modify_links(file_obj):
"""
Function will parse file contents (opened in utf-8 mode) and modify standalone [[wikilinks]] and in-line
[[wikilinks]](wikilinks) into traditional Markdown link syntax.
:param file_obj: Path to file
:return: String containing modified text. Newlines will be returned as '\\n' in the string.
"""
file = file_obj
try:
with open(file, encoding="utf8") as infile:
line = infile.read()
# Read the entire file as a single string
linelist = regex.sub(r"(?V1)"
r"(?s)```.*?```(*SKIP)(*FAIL)(?-s)|(?s)`.*?`(*SKIP)(*FAIL)(?-s)"
# Ignore fenced & inline code blocks. V1 engine allows in-line flags so
# we enable newline matching only here.
r"|(\ {4}|\t).*(*SKIP)(*FAIL)"
# Ignore code blocks beginning with 4 spaces/1 tab
r"|(\[\[(.*)\]\](?!\s\(|\())", r"[\3](\3.md)", line)
# Finds references that are in style [[foo]] only by excluding links in style [[foo]](bar) or
# [[foo]] (bar). Capture group $3 returns just foo
linelist_final = regex.sub(r"(?V1)"
r"(?s)```.*?```(*SKIP)(*FAIL)(?-s)|(?s)`.*?`(*SKIP)(*FAIL)(?-s)"
r"|(\ {4}|\t).*(*SKIP)(*FAIL)"
# Refer comments above for this portion.
r"|(\[\[(\d+)\]\](\s\(|\()(.*)(?=\))\))", r"[\3](\3 \5.md)", linelist)
# Finds only references in style [[123]](bar) or [[123]] (bar). Capture group $3 returns 123 and capture
# group $5 returns bar
except EnvironmentError:
logging.exception("Unable to open file %s for reading", file)
return linelist_final
The above function handles [[wikilinks]] in inline code blocks, fenced code blocks and code blocks indented with 4 spaces. There is currently one false positive scenario where it ignores a valid [[wiklink]] which is when the link appears on the 3rd level or deeper of a Markdown list, i.e.:
* Level 1
* Level 2
* [[wikilink]] #Not recognized
* [[wikilink]] #Not recognized.
However my documents do not have wikilinks nested at that level in lists so it's not a problem for me.

Automate python argparse script with pre-generated inputs

I have a program which takes folder paths and other inputs through the command line with argparse. I want this script to run automatically on a server, but I also want to keep its argparse functionality in case I want to run the script manually. Is there a way to have the script use pre-generated inputs from a file but also retain its flag based input system with argparse? Here is my current implementation:
parser = argparse.ArgumentParser(description='runs batch workflow on root directory')
parser.add_argument("--root", type=str, default='./', help="the path to the root directory
to process")
parser.add_argument("--data", type=str, default='MS', help="The type of data to calculate ")
args = parser.parse_args()
root_dir = args.root
option = args.data
I'm pretty new to this stuff, and reading the argparse documentation and This stack overflow question is not really what I want, if possible I would like to keep the root and data flags, and not just replace them with an input file or stdin.
If using argparse, the default keyword argument is a good, standard way to approach the problem; embed the default behavior of the program in the script source, not an external configuration file. However, if you have multiple configuration files that you want to deploy differently, the approach you mentioned (pre-generated from an input) is desirable.
argparse to dictionary
The argparse namespace can be converted to a dictionary. This is convenient as we can make a function that accepts a dictionary, or keyword arguments, and have it process the program with a convenient function signature. Also, file parsers can just as easily load dictionaries and interact with the same function. The python json module is used as an example. Of course, others can be used.
Example Python
def main(arg1=None, arg2=None, arg3=None):
print(f"{arg1}, {arg2}, {arg3}")
if __name__ == "__main__":
import sys
import json
import argparse
# script called with nothing -- load default
if len(sys.argv) == 1:
with open("default.json", "r") as dfp:
conf = json.load(dfp)
main(**conf)
else: # parse arguments
parser = argparse.ArgumentParser()
parser.add_argument('-a1', dest='arg1', metavar='arg1', type=str)
parser.add_argument('-a2', dest='arg2', metavar='arg2', type=str)
parser.add_argument('-a3', dest='arg3', metavar='arg3', type=str)
args = parser.parse_args()
conf = vars(args)
main(**conf)
default.json
{
"arg1" : "str1",
"arg2" : "str2",
"arg3" : "str3"
}
Using Fire
The python Fire module can be used more conveniently as well. It has multiple modes that the file can be interacted with minimal effort. The github repo is available here.

Overriding file.write in python 3

I'm aware of the SO post How do I override file.write() in Python 3? but after looking it over and trying whats suggested I'm still stuck.
I want to override the file.write method in Python 3 so that I can "REDACT" certain words (Usernames, Passwords...etc).
I found a great example of overriding the print and general stdout and stderr http://code.activestate.com/recipes/119404/
The issue is that it doesn't work for file.write. How can I override the file.write?
My code for redacting when printing is:
def write(self, text):
for word in self.redacted_list:
text = text.replace(word, "REDACTED")
self.origOut.write(text)
return text
thanks
From the self.origOut.write(text) I assume you are trying to write an in-between-class that pretends to be a file but provides a different .write() method.
I don't see any problems in the code you posted (assuming it's a method of a class you use). Possibly you wrote a class but forgot to create instances of it?
Did you try to write something like this?:
class IAmNoARealFile:
def __init__(self, real_file):
self.origOut = real_file
def __getattr__(self, attr_name): # provide everything a file has
return getattr(self.origOut, attr_name)
def write(self, ...):
...
with open('test.txt', 'w') as f:
f = IAmNotARealFile(f) # did you forget this?
f.write('some text SECRET blah SECRET') # calls IAMNotARealFile.write with your extra code
with open('test.txt') as f:
f = IAmNotARealFile(f)
print(f.read()) # this "falls through" to the actual file object
you will also probably want to return self.origOut.write() in your own .write(), if you don't have a specific reason not to.
Note that if you rewrite open() to directly return IAMNotARealFile:
def open(*args, **kwargs):
return IAMNotARealFile(open(*args, **kwargs))
you will have to manually supply (some) "magic methods" because
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup.
--docs for .__getattribute__(), but it also applies to .__getattr__()
Why?
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
-- On special ("magic") method lookup [code style and emphasis mine]

Writing a custom builder that executes external command and python function

I'm looking to write a custom SCons Builder that:
Executes an external command to produce foo.temp
Then executes a python function to manipulate foo.temp and produce the final output file
I've referred to the two following sections, but I'm not sure the correct way to "glue" them together.
18.1. Writing Builders That Execute External Commands
18.4. Builders That Execute Python Functions
I know that Command accepts a list of actions to take. But how do I properly handle that intermediate file? Ideally the intermediate file would be invisible to the user -- the entire Builder would appear to operate atomically.
Here's what I've come up with that seems to be working. However the .bin file isn't being deleted automatically.
from SCons.Action import Action
from SCons.Util import is_List
from SCons.Script import Delete
_objcopy_builder = Builder(
action = 'objcopy -O binary $SOURCE $TARGET',
suffix = '.bin',
single_source = 1
)
def _add_header(target, source, env):
source = str(source[0])
target = str(target[0])
with open(source, 'rb') as src:
with open(target, 'wn') as tgt:
tgt.write('MODULE\x00\x00')
tgt.write(src.read())
return 0
_addheader_builder = Builder(
action = _add_header,
single_source = 1
)
def Elf2Mod(env, target, source, *args, **kw):
def check_one(x, what):
if not is_List(x):
x = [x]
if len(x) != 1:
raise StopError('Only one {0} allowed'.format(what))
return x
target = check_one(target, 'target')
source = check_one(source, 'source')
# objcopy a binary file
binfile = _objcopy_builder.__call__(env, source=source, **kw)
# write the module header
_addheader_builder.__call__(env, target=target, source=binfile, **kw)
# delete the intermediate binary file
# TODO: Not working
Delete(binfile)
return target
def generate(env):
"""Add Builders and construction variables to the Environment."""
env.AddMethod(Elf2Mod, 'Elf2Mod')
print 'Added Elf2Mod to env {0}'.format(env)
def exists(env):
return True
This can indeed be done with the Command builder, by specifying a list of actions, as follows:
Command('foo.temp', 'foo.in',
['your_external_action',
your_python_function])
Notice that foo.in is the source, and you should name it accordingly. But if foo.temp is internal as you mention, then this approach probably isnt the best approach.
Another way, which I feel is much more flexible, would be to use Custom Builder with a Generator and/or Emitter.
The Generator is a Python function where you do the actual work, which in your case would be calling the external command, and also call the Python function.
An Emitter allows you to have a fine-tuned control over the sources and targets. I used a Builder with a Emitter (and Generator) once to do C++ and Java code-generation with Thrift input IDL files. I had to read and process the Thrift input file to know exactly what files would be code-generated (which are the actual targets), and the Emitter is the best/only way to do something like this. If your particular use-case isnt so complicated, you can skip the Emitter and just list your sources/targets in the call to the builder. But if you want foo.temp to be transparent to the end-user, then you'll need an Emitter.
When using a Custom Builder with a Generator and Emitter, the Emitter will be called every time by SCons to calculate the sources and dependencies to know if the Generator needs to be called. The Generator will only be called if one of the targets is considered older with respect to the sources.
There are numerous examples showing how to use a Generator and Emitter in a Custom Builder, so I wont list the code here, but let me know if you need help with the syntax, etc.

Discover source of stdout output

I'm using a third party lib that is printing \n a couple of hundred times to stdout - it appears to be a bug in their logging.
What is the best approach to discover what line of code is producing this output (I'm hoping it's python and not an external library)?
Some ideas:
How can I change the display of the \n character to something recognizable like A so that if I step through with a debugger, I can see when the output occurs?
Can I monkey-patch a low level function to cause some output or a debug-break everytime characters are sent to std-out?
You could try to overtake stdout:
class StdoutFilter:
def __init__(self, realStdout):
self.realStdout = realStdout
def write(self, text):
if text == '\n': # or some more complicated condition
raise Exception("Newline alert!")
self.realStdout.write(text)
import sys
sys.stdout = StdoutFilter(sys.stdout)
# import the 3rd party library and use it
In this filter class you can add a counter and raise the exception only after more than N subsequent newlines were printed. The stack trace of the exception will reveal the source of the malicious print.
Or you can put a break point there, or change the \n to something else.

Resources