Writing a custom builder that executes external command and python function - scons

I'm looking to write a custom SCons Builder that:
Executes an external command to produce foo.temp
Then executes a python function to manipulate foo.temp and produce the final output file
I've referred to the two following sections, but I'm not sure the correct way to "glue" them together.
18.1. Writing Builders That Execute External Commands
18.4. Builders That Execute Python Functions
I know that Command accepts a list of actions to take. But how do I properly handle that intermediate file? Ideally the intermediate file would be invisible to the user -- the entire Builder would appear to operate atomically.
Here's what I've come up with that seems to be working. However the .bin file isn't being deleted automatically.
from SCons.Action import Action
from SCons.Util import is_List
from SCons.Script import Delete
_objcopy_builder = Builder(
action = 'objcopy -O binary $SOURCE $TARGET',
suffix = '.bin',
single_source = 1
)
def _add_header(target, source, env):
source = str(source[0])
target = str(target[0])
with open(source, 'rb') as src:
with open(target, 'wn') as tgt:
tgt.write('MODULE\x00\x00')
tgt.write(src.read())
return 0
_addheader_builder = Builder(
action = _add_header,
single_source = 1
)
def Elf2Mod(env, target, source, *args, **kw):
def check_one(x, what):
if not is_List(x):
x = [x]
if len(x) != 1:
raise StopError('Only one {0} allowed'.format(what))
return x
target = check_one(target, 'target')
source = check_one(source, 'source')
# objcopy a binary file
binfile = _objcopy_builder.__call__(env, source=source, **kw)
# write the module header
_addheader_builder.__call__(env, target=target, source=binfile, **kw)
# delete the intermediate binary file
# TODO: Not working
Delete(binfile)
return target
def generate(env):
"""Add Builders and construction variables to the Environment."""
env.AddMethod(Elf2Mod, 'Elf2Mod')
print 'Added Elf2Mod to env {0}'.format(env)
def exists(env):
return True

This can indeed be done with the Command builder, by specifying a list of actions, as follows:
Command('foo.temp', 'foo.in',
['your_external_action',
your_python_function])
Notice that foo.in is the source, and you should name it accordingly. But if foo.temp is internal as you mention, then this approach probably isnt the best approach.
Another way, which I feel is much more flexible, would be to use Custom Builder with a Generator and/or Emitter.
The Generator is a Python function where you do the actual work, which in your case would be calling the external command, and also call the Python function.
An Emitter allows you to have a fine-tuned control over the sources and targets. I used a Builder with a Emitter (and Generator) once to do C++ and Java code-generation with Thrift input IDL files. I had to read and process the Thrift input file to know exactly what files would be code-generated (which are the actual targets), and the Emitter is the best/only way to do something like this. If your particular use-case isnt so complicated, you can skip the Emitter and just list your sources/targets in the call to the builder. But if you want foo.temp to be transparent to the end-user, then you'll need an Emitter.
When using a Custom Builder with a Generator and Emitter, the Emitter will be called every time by SCons to calculate the sources and dependencies to know if the Generator needs to be called. The Generator will only be called if one of the targets is considered older with respect to the sources.
There are numerous examples showing how to use a Generator and Emitter in a Custom Builder, so I wont list the code here, but let me know if you need help with the syntax, etc.

Related

Overriding file.write in python 3

I'm aware of the SO post How do I override file.write() in Python 3? but after looking it over and trying whats suggested I'm still stuck.
I want to override the file.write method in Python 3 so that I can "REDACT" certain words (Usernames, Passwords...etc).
I found a great example of overriding the print and general stdout and stderr http://code.activestate.com/recipes/119404/
The issue is that it doesn't work for file.write. How can I override the file.write?
My code for redacting when printing is:
def write(self, text):
for word in self.redacted_list:
text = text.replace(word, "REDACTED")
self.origOut.write(text)
return text
thanks
From the self.origOut.write(text) I assume you are trying to write an in-between-class that pretends to be a file but provides a different .write() method.
I don't see any problems in the code you posted (assuming it's a method of a class you use). Possibly you wrote a class but forgot to create instances of it?
Did you try to write something like this?:
class IAmNoARealFile:
def __init__(self, real_file):
self.origOut = real_file
def __getattr__(self, attr_name): # provide everything a file has
return getattr(self.origOut, attr_name)
def write(self, ...):
...
with open('test.txt', 'w') as f:
f = IAmNotARealFile(f) # did you forget this?
f.write('some text SECRET blah SECRET') # calls IAMNotARealFile.write with your extra code
with open('test.txt') as f:
f = IAmNotARealFile(f)
print(f.read()) # this "falls through" to the actual file object
you will also probably want to return self.origOut.write() in your own .write(), if you don't have a specific reason not to.
Note that if you rewrite open() to directly return IAMNotARealFile:
def open(*args, **kwargs):
return IAMNotARealFile(open(*args, **kwargs))
you will have to manually supply (some) "magic methods" because
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup.
--docs for .__getattribute__(), but it also applies to .__getattr__()
Why?
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
-- On special ("magic") method lookup [code style and emphasis mine]

python argparse ignore other options when a specific option is used

I am writing a python program that I want to have a command line interface that behaves in a particular way
The command line interface should accept the following invocations:
my_prog test.svg foo
my_prog --font=Sans test.svg foo
(it will generate an svg with the word foo written in the specified or default font)
Now I want to be able to also have this command accept the following invocation...
my_prog --list-fonts
which will list all of the valid options to --font as determined by the fonts available on the system.
I am using argparse, and I have something like this:
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', action='store_true')
args = parser.parse_args()
however this does not make the --list-fonts option behave as I would like as the two positional arguments are still required.
I have also tried using subparsers, but these still need a workaround to prevent the other options being required every time.
How do I get the desired behaviour with argparse.
argparse allows you to define arbitrary actions to take when encountering an argument, based on the action keyword argument to add_argument (see the docs)
You can define an action to list your fonts and then abort argument parsing, which will avoid checking for the other required arguments.
this could look like this:
class ListFonts(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
print("list of fonts here")
parser.exit() # exits the program with no more arg parsing and checking
Then you can add it to your argument like so:
parser.add_argument('--list-fonts', nargs=0, action=ListFonts)
Note nargs=0 has been added so that this argument doesn't require a value (the code in the question achieved this with action='store_true')
This solution has a side-effect of enabling the invocations like the following to also list the fonts and exits without running the main program:
my_prog --font Sans test.svg text --list-fonts
This is likely not a problem as it's not a typical use case, especially if the help text explains this behaviour.
If defining a new class for each such option feels too heavyweight, or perhaps you have more than one option that has this behaviour, then you could consider having a function that implements the desired action for each argument and then have a kind of factory function that returns a class that wraps the function. A complete example of this is shown below.
def list_fonts():
print("list of fonts here")
def override(func):
""" returns an argparse action that stops parsing and calls a function
whenever a particular argument is encountered. The program is then exited """
class OverrideAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string):
func()
parser.exit()
return OverrideAction
parser = argparse.ArgumentParser()
parser.add_argument('output_file')
parser.add_argument('text')
parser.add_argument('--font', help='list options with --list-fonts')
parser.add_argument('--list-fonts', nargs=0, action=override(list_fonts),
help='list the font options then stop, don\'t generate output')
args = parser.parse_args()

Inheritance and Pandas

I am trying to create a file writer based on Pandas' ExcelWriter. I proceeded as I usually do with classes in Python (3) with inheritance:
import pandas as pd
class Writer(pd.ExcelWriter):
def __init__(self, fname, engine='openpyxl'):
pd.ExcelWriter.__init__(self, fname, engine=engine)
self.newvar = 0
However, when I try to use it, I cannot access newvar:
test = Writer('test.xlsx')
test.newvar
returns:
AttributeError: '_XlsxWriter' object has no attribute 'nmax'
And when I check the type of test, it returns:
pandas.io.excel._XlsxWriter
I don't understand what I am missing since I used this kind of inheritance in many other cases. Any idea would be appreciated!
This is because pandas.ExcelWriter.__new__ returns a different class than itself (actually it is an abc.ABCMeta). The class is chosen based on the extension of the file path and the engine which is used - you could observe that when you checked the type of the newly created instance. That means the __init__ method of whatever class is returned gets called. You can think of ExcelWriter as some kind of proxy for the specific writers for each format and engine (though it also defines the API which such a writer must provide).
In order to make your writer available (for the given engine), you need to register it.
But before you can do that you need to make your class compatible by following the instructions which you'll find via help(pandas.ExcelWriter). For the sake of completeness I cite them here:
# Defining an ExcelWriter implementation (see abstract methods for more...)
# - Mandatory
# - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)``
# --> called to write additional DataFrames to disk
# - ``supported_extensions`` (tuple of supported extensions), used to
# check that engine supports the given extension.
# - ``engine`` - string that gives the engine name. Necessary to
# instantiate class directly and bypass ``ExcelWriterMeta`` engine
# lookup.
# - ``save(self)`` --> called to save file to disk
# - Mostly mandatory (i.e. should at least exist)
# - book, cur_sheet, path
# - Optional:
# - ``__init__(self, path, engine=None, **kwargs)`` --> always called
# with path as first argument.
So with that in mind we can extend your class:
class Writer(pd.ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('xlsx',)
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
# Implement something useful here.
pass
def save(self):
# Implement something useful here.
pass
def __init__(self, fname, engine='openpyxl', **kwargs):
super().__init__(self, fname, engine=engine, **kwargs)
Now you can use pd.io.excel.register_writer(Writer) to register the writer. But you need to make sure the engine which you've specified matches your version of openpyxl. You can check the process of how a specific writer is chosen here; the writers which are currently registered for each version can be checked via print(pd.io.excel._writers).
As a side note: You can also subclass one of the already available specific writers and reuse their write_cells and save methods for example (however you'll need to register your writer also in that case):
_Openpyxl1Writer
_Openpyxl20Writer
_Openpyxl22Writer
_XlwtWriter
_XlsxWriter

Using python 3.x how can I pass a Tree object from ete3 to DendroPy without writing to file

I'm using the ete3 package in python to build phylogenetic trees from data I've generated with a stochastic model and it works well. I have previously written these trees to newick format and then used another script, with the package Dendropy, to read these trees and do some analysis of them. Both of these scripts work fine.
I am now trying to do a large amount of this sort of data processing and want to write a single script in which I skip the file writing. Both methods are called Tree, so I got around this by importing the dendropy method like:
from dendropy import Tree as DTree
and the ete3 method like:
from ete3 import Tree
which seems to be ok.
The question I have is how to pass the object from one package to the other. I have a loop in which I first build the tree object using the ete3 methods, and I call it 't'. My plan was then to use the Tree.write method in ete3 to pass the tree obect to Dendropy using the 'get' method and skipping the actual outfile bit, like this:
treePass = t.write(format = 1)
DendroTree = DTree.get(treePass, schema = 'newick')
but this gives the error:
DendroTree = DTree.get(treePass)
TypeError: get() takes 1 positional argument but 2 were given
Any thoughts are welcome.
DTree.get() only takes self as actual argument and rest is given through keywords. This basically means you cannot pass treePass to DTree.get() as an argument.
I haven't used either of those libs, but I have found a way to import data to dendropy tree here.
tree = DTree.get(data="((A,B),(C,D));",schema="newick")
Which means you'd have to get your tree from ete3 in this format. it doesn't seem that unusual for a tree, so after a bit more looking there seems to be supported format in ete3, which you can read here. I believe it's number 9.
So in the end I'd try this:
from dendropy import Tree as DTree
from ete3 import Tree
#do your Tree generation magic here
DendroTree = DTree.get(data=t.write(format = 9),schema = 'newick')
Edit:
As I'm reading more and more, I believe that any format should be read so basically all you have to add to your example is data here: DendroTree = DTree.get(data=treePass, schema = 'newick')

how structure a python3/tkinter project

I'm developing a small application using tkinter and PAGE 4.7 for design UI.
I had designed my interface and generated python source code. I got two files:
gm_ui_support.py: here declare tk variables
gm_ui.py : here declare widget for UI
I'm wondering how this files are supposed to be use, one of my goals is to be able to change the UI as many times as I need recreating this files, so if I put my code inside any of this files will be overwritten each time.
So, my question is:
Where I have to put my own code? I have to extend gm_ui_support? I have to create a 3th class? I do directly at gm_ui_support?
Due the lack of answer I'm going to explain my solution:
It seems that is not possible to keep both files unmodified, so I edit gm_ui_support.py (declaration of tk variables and events callback). Each time I make a change that implies gm_ui_support.py I copy changes manually.
To minimize changes on gm_ui_support I create a new file called gm_control.py where I keep a status dict with all variables (logical and visual) and have all available actions.
Changes on gm_ui_support.py:
I create a generic function (sync_control) that fills my tk variables using a dict
At initialize time it creates my class and invoke sync_control (to get default values defined in control)
On each callback I get extract parameter from event and invoke logical action on control class (that changes state dict), after call to sync_control to show changes.
Sample:
gm_ui_support.py
def sync_control():
for k in current_gm_control.state:
gv = 'var_'+k
if gv in globals():
#print ('********** found '+gv)
if type(current_gm_control.state[k]) is list:
full="("
for v in current_gm_state.state[k]:
if len(full)>1: full=full+','
full=full+"'"+v+"'"
full=full+")"
eval("%s.set(%s)" % (gv, full))
else:
eval("%s.set('%s')" % (gv, current_gm_state.state[k]))
else:
pass
def set_Tk_var():
global current_gm_state
current_gm_control=gm_control.GM_Control()
global var_username
var_username = StringVar()
...
sync_control()
...
def on_select_project(event):
w = event.widget
index = int(w.curselection()[0])
value = w.get(index)
current_gm_control.select_project(value)
sync_state()
...

Resources