Automate python argparse script with pre-generated inputs - python-3.x

I have a program which takes folder paths and other inputs through the command line with argparse. I want this script to run automatically on a server, but I also want to keep its argparse functionality in case I want to run the script manually. Is there a way to have the script use pre-generated inputs from a file but also retain its flag based input system with argparse? Here is my current implementation:
parser = argparse.ArgumentParser(description='runs batch workflow on root directory')
parser.add_argument("--root", type=str, default='./', help="the path to the root directory
to process")
parser.add_argument("--data", type=str, default='MS', help="The type of data to calculate ")
args = parser.parse_args()
root_dir = args.root
option = args.data
I'm pretty new to this stuff, and reading the argparse documentation and This stack overflow question is not really what I want, if possible I would like to keep the root and data flags, and not just replace them with an input file or stdin.

If using argparse, the default keyword argument is a good, standard way to approach the problem; embed the default behavior of the program in the script source, not an external configuration file. However, if you have multiple configuration files that you want to deploy differently, the approach you mentioned (pre-generated from an input) is desirable.
argparse to dictionary
The argparse namespace can be converted to a dictionary. This is convenient as we can make a function that accepts a dictionary, or keyword arguments, and have it process the program with a convenient function signature. Also, file parsers can just as easily load dictionaries and interact with the same function. The python json module is used as an example. Of course, others can be used.
Example Python
def main(arg1=None, arg2=None, arg3=None):
print(f"{arg1}, {arg2}, {arg3}")
if __name__ == "__main__":
import sys
import json
import argparse
# script called with nothing -- load default
if len(sys.argv) == 1:
with open("default.json", "r") as dfp:
conf = json.load(dfp)
main(**conf)
else: # parse arguments
parser = argparse.ArgumentParser()
parser.add_argument('-a1', dest='arg1', metavar='arg1', type=str)
parser.add_argument('-a2', dest='arg2', metavar='arg2', type=str)
parser.add_argument('-a3', dest='arg3', metavar='arg3', type=str)
args = parser.parse_args()
conf = vars(args)
main(**conf)
default.json
{
"arg1" : "str1",
"arg2" : "str2",
"arg3" : "str3"
}
Using Fire
The python Fire module can be used more conveniently as well. It has multiple modes that the file can be interacted with minimal effort. The github repo is available here.

Related

Python3 Unable to store stdout to variable

My case is a little bit specific. I'm trying to run a Python program using Python for testing purposes. The case is as follows:
# file1.py
print("Hello world")
# file1.test.py
import io
import sys
import os
import unittest
EXPECTED_OUTPUT = "Hello world"
class TestHello(unittest.TestCase):
def test_hello(self):
sio = io.StringIO()
sys.stdout = sio
os.system("python3 path/to/file1.py")
sys.stdout = sys.__stdout__
print("captured value:", sio.getvalue())
self.assertEqual(sio.getvalue(), EXPECTED_STDOUT)
if __name__ == "__main__":
unittest.main()
But nothing ends up in the sio variable. This way and similar ways are introduced online but they don't seem to work for me. My Python version is 3.8.10 but it doesn't really matter if this works better in some other version, I can switch to that.
Note: I know that if I was using an importable object this might be easier, but right now I need to know how to catch the output of another file.
Thanks!
stdout redirection does not work like this - this will change the stdout variable inside your Python process. But by using os.system, you are running another process, that will re-use the same terminal pseudo-files your parent process is using.
If you want to log a subprocess, the way to do it is to use the subprocess modules calls, which allow you to redirect the subprocess output. https://docs.python.org/3/library/subprocess.html
Also, the subprocess won't be able to use a StringIO object from the parent process (it is not an O.S. level object, just an in-process Python object with a write method). The docs above include instructions about using the special object subprocess.PIPE which allows for in-memory communication, or, you can just pass an ordinary filesystem file, which you can read afterwards.

python argparse ignore other options when a specific option is used

I am writing a python program that I want to have a command line interface that behaves in a particular way
The command line interface should accept the following invocations:
my_prog test.svg foo
my_prog --font=Sans test.svg foo
(it will generate an svg with the word foo written in the specified or default font)
Now I want to be able to also have this command accept the following invocation...
my_prog --list-fonts
which will list all of the valid options to --font as determined by the fonts available on the system.
I am using argparse, and I have something like this:
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', action='store_true')
args = parser.parse_args()
however this does not make the --list-fonts option behave as I would like as the two positional arguments are still required.
I have also tried using subparsers, but these still need a workaround to prevent the other options being required every time.
How do I get the desired behaviour with argparse.
argparse allows you to define arbitrary actions to take when encountering an argument, based on the action keyword argument to add_argument (see the docs)
You can define an action to list your fonts and then abort argument parsing, which will avoid checking for the other required arguments.
this could look like this:
class ListFonts(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
print("list of fonts here")
parser.exit() # exits the program with no more arg parsing and checking
Then you can add it to your argument like so:
parser.add_argument('--list-fonts', nargs=0, action=ListFonts)
Note nargs=0 has been added so that this argument doesn't require a value (the code in the question achieved this with action='store_true')
This solution has a side-effect of enabling the invocations like the following to also list the fonts and exits without running the main program:
my_prog --font Sans test.svg text --list-fonts
This is likely not a problem as it's not a typical use case, especially if the help text explains this behaviour.
If defining a new class for each such option feels too heavyweight, or perhaps you have more than one option that has this behaviour, then you could consider having a function that implements the desired action for each argument and then have a kind of factory function that returns a class that wraps the function. A complete example of this is shown below.
def list_fonts():
print("list of fonts here")
def override(func):
""" returns an argparse action that stops parsing and calls a function
whenever a particular argument is encountered. The program is then exited """
class OverrideAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
func()
parser.exit()
return OverrideAction
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', nargs=0, action=override(list_fonts),
help='list the font options then stop, don\'t generate output')
args = parser.parse_args()

debugging a python script taking input from sys.stdin in pycharm

I want to debug a small python script that takes input from stdin and sends it to stdout. Used like this:
filter.py < in.txt > out.txt
There does not seem to be a way to configure Pycharm debugging to pipe input from my test data file.
This question has been asked before, and the answer has been, basically "you can't--rewrite the script to read from a file."
I modified the code to take a file, more or less doubling the code size, with this:
import argparse
if __name__ == '__main__':
cmd_parser = argparse.ArgumentParser()
cmd_parser.add_argument('path', nargs='?', default='/dev/stdin')
args = cmd_parser.parse_args()
with open(in_path) as f:
filter(f)
where filter() now takes a file object open for write as a parameter. This permits backward compatibility so it can be used as above, while I am also able to invoke it under the debugger with input from a file.
I consider this an ugly solution. Is there a cleaner alternative? Perhaps something that leaves the ugliness in a separate file?
If you want something simpler, you can forgo argparse entirely and just use the sys.argv list to get the first argument.
import sys
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = sys.stdin
with open(filename) as f:
filter(f)

Multiprocessing with subprocess.call for an entire directory of files

I have a Python3 script that uses subprocess.call to run a program on about 2,300 input files in a directory and there are two output files for each input file. I have these two outputs going into two different directories. I would like to learn how to multiprocess my script so several files can be processed at the same time. I have been reading on the multiprocess library in Python but it might be too advanced for me to understand. Below is the script if the experts have any input. Thanks so much!
Script:
import os
import subprocess
import argparse
parser = argparse.ArgumentParser(description="This script aligns DNA sequences in files in a given directory.")
parser.add_argument('--root', default="/shared/testing_macse/", help="PATH to the input directory containing CDS orthogroup files.")
parser.add_argument('--align_NT_dir', default="/shared/testing_macse/NT_aligned/", help="PATH to the output directory for NT aligned CDS orthogroup files.")
parser.add_argument('--align_AA_dir', default="/shared/testing_macse/AA_aligned/", help="PATH to the output directory for AA aligned CDS orthogroup files.")
args = parser.parse_args()
def runMACSE(input_file, NT_output_file, AA_output_file):
MACSE_command = "java -jar ~/bin/MACSE/macse_v1.01b.jar "
MACSE_command += "-prog alignSequences "
MACSE_command += "-seq {0} -out_NT {1} -out_AA {2}".format(input_file, NT_output_file, AA_output_file)
# print(MACSE_command)
subprocess.call(MACSE_command, shell=True)
Orig_file_dir = args.root
NT_align_file_dir = args.align_NT_dir
AA_align_file_dir = args.align_AA_dir
try:
os.makedirs(NT_align_file_dir)
os.makedirs(AA_align_file_dir)
except FileExistsError as e:
print(e)
for currentFile in os.listdir(args.root):
if currentFile.endswith(".fa"):
runMACSE(args.root + currentFile, args.align_NT_dir + currentFile[:-3]+"_NT_aligned.fa", args.align_AA_dir + currentFile[:-3]+"_AA_aligned.fa")
Subprocess functions run any command-line executable in a separate process. You are running java. Multiprocessing runs python code in separate processes, just as threading runs python code in separate threads. The API for the two is intentionally similar. So multiprocessing cannot substitute for non-python subprocess calls.
It would be a waste of processes to use multiple python processes to initiate multiple java processes. You could just as well use multiple threads to make multiple subprocess calls. Or use the async module.
Or make your own scheduler. Wrap your for-if in a generator function.
def fa_file(path):
for currentFile in os.listdir(path):
if currentFile.endswith(".fa"):
yield currentFile
fafiles = fa_file(arg.root)
Make an array of, say, 10 Popen objects. Sleep for some appropriate interval. Upon waking, loop through the array and replace finished subprocesses (.poll() returns something other than None) for as long as next(fafiles) returns something.
EDIT: If you did the image processing in Python code that calls compiled C code (pillow, for instance), then you could use multiprocessing and a Queue loaded with the files to process.

optional parameter without a switch with Pythons argparse

(Please feel free to edit the title to make it better to understand.)
I want to call (on bash) a Python script in this two ways without any error.
./arg.py
./arg.py TEST
It means that the parameter (here with the value TEST) should be optional.
With argparse I only know a way to create optional paramters when they have a switch (like --name).
Is there a way to fix that?
#!/usr/bin/env python3
import sys
import argparse
parser = argparse.ArgumentParser(description=__file__)
# must have
#parser.add_argument('name', metavar='NAME', type=str)
# optional BUT with a switch I don't want
#parser.add_argument('--name', metavar='NAME', type=str)
# store all arguments in objects/variables of the local namespace
locals().update(vars(parser.parse_args()))
print(name)
sys.exit()
I think all you need is nargs='?'.
parser = argparse.ArgumentParser(description=__file__)
parser.add_argument('name', nargs='?', default='mydefault')
args = parser.parse_args()
I'd expect args to be either:
namespace(name='mydefault')
namespace(name='TEST')

Resources