scons: How to deal with dynamic targets?

scons: How to deal with dynamic targets? - scons

I'm trying to automate my work of converting PDF to png file with scons. The tool used for my conversion is convert from ImageMagick.
Here's the raw command line:
convert input.pdf temp/temp.png
convert temp/*.png -append output.png
The first command will generate one PNG file for each page in PDF file, so the target of the first command is a dynamic file list.
Here's the SConstruct file I'm working on:
convert = Builder(action=[
Delete("${TARGET.dir}"),
Mkdir("${TARGET.dir}"),
"convert $SOURCE $TARGET"])
combine = Builder(action="convert $SOURCE -append $TARGET")
env = Environment(BUILDERS={"Convert": convert, "Combine": combine})
pdf = env.PDF("input.tex")
pngs = env.Convert("temp/temp.png", pdf) # I don't know how to specify target in this line
png = env.Combine('output.png', pngs)
Default(png)
The code pngs = env.Convert("temp/temp.png", pdf) actually is wrong since the target is multiple files that I don't know how many before env.Convert is executed, so the final output.png only contains the first page of the PDF file.
Any hint is appreciated.
UPDATE:
I just found that I can use command convert input.pdf -append output.png to avoid the two-step conversion.
Still I'm curious how to handle the scenario when the intermediate temporary file list is unknown beforehand and requires a dynamic target list.

If you want to know how to do the original (convert and combine) situation you proposed, I would suggest creating a builder with a SCons Emitter. The emitter allows you to modify the list of source and target files. This works nicely for generated files that dont exist with a clean build.
As you mentioned, the convert step will generate multiple targets, the trick is you need to be able to "calculate" those targets in the emitter based on the source. For example, recently I created a wsdl2java builder and was able to do some simple wsdl parsing in the emitter to calculate all of the target java files to be generated (the source being the wsdl).
Here is a general idea of what the build scripts should look like:
def convert_emitter(source, target, env):
# both and source and target will be a list of nodes
# in this case, the target will be empty, and you need
# to calculate all of the generated targets based on the
# source pdf file. You will need to open the source file
# with standard python code. All of the targets will be
# removed when cleaned (scons -c)
target = [] # fill in accordingly
return (target, source)
# Optionally, you could supply a function for the action
# which would have the same signature as the emitter
convert = env.Builder(emitter=convert_emitter,
action=[
Delete("temp"),
Mkdir("temp"),
"convert $SOURCE $TARGET"])
env.Append(BUILDERS={'Convert' : convert})
combine = env.Builder(action=convert_action, emitter=combine_emitter)
env.Append(BUILDERS={'Combine' : combine})
pdf = env.PDF('input.tex')
# You can omit the target in this call, as it will be filled-in by the emitter
pngs = env.Convert(source=pdf)
png = env.Combine(target='output.png', source=pngs)

Depending on what qualifies as "dynamic" for you, I believe the correct answer is: not possible.
As long as the source on which you would like to "dynamically" compute a target set is present when SCons is run, #Brady's solution should work fine. However, if the source in question itself is the target of some other command, it will not work. This is a fundamental limitation of SCons, as it makes the assumption that the set of build targets can be statically determined from the base set of input (non-intermediate) sources. It runs through and computes a build/target/dependency graph in one sweep, then executes it in the next. It has no ability to run through some known portion of the build graph, stop to introspect some intermediate targets to dynamically compute the rest of the build graph, and then continue. I'd frankly love for this ability in the work that I do with SCons, but I'm afraid this is just a fundamental limitation.
The best you can do is set the build up so that on the first run, it stops at the construction of the PDF (if no PDF target exists when the build script is executed). Once the PDF has been built, you can rerun the build and set things up so the rest of the build steps execute based on the PDF built from the last run. This more or less works decently... except for one problem. If the PDF ends up changing (and producing some new pages for instance), you'll actually have to rerun the build twice in order to capture the changes to the PDF, since any page counts (etc) will be based on the old version of the PDF.
I'd love for someone to prove me wrong here, but such is the way of things.

Looking at this, there's no requirement for the individual temp/*png to be kept - if there was, you shouldn't be putting them in a temp directory, and in any case you'd have to do quite a bit of work if you wanted to work out which pages to generate.
So it looks more sensible to do this as one step, this So you'd have something like this
png = env.Convert('output.png', 'input.pdf')
where the action function for convert was something like this:
Delete('temp'),
Mkdir('temp'),
'convert $SOURCE temp/$TARGET',
'for i in temp/*png; do convert $TARGET temp/$i',
Delete('temp')
Though frankly you might do better with writing that whole thing as a single callable script to make sure you got the page sorting correct.

Related

SCons Ignore function not working

I have some log files generated after each file is compiled.
I am making SCons aware of these files by using an emitter attached to the builder that I'm using to compile that file.
Unfortunately, because I am deleting the empty log files after each build SCons recompiles the source files because the log files are missing.
I would like to ignore these 'side effect' files using SCons Ignore function.
In my emitter I am doing something like this:
def compiler_emitter(target, source, env):
target.append(env.File(source[0].name.split('.')[0] + env['ERRSUFFIX']))
env.Ignore(source[0], target[1])
return target, source
As a note I always pass only one file to my builder.
In my case Ignore function is not working.
What will be the best approach to solve this problem in a 'SCons way' ?

Try using env.SideEffect() instead of Ignore:
SideEffect(side_effect, target) , env.SideEffect(side_effect, target)
Declares side_effect as a side effect of building target. Both
side_effect and target can be a list, a file name, or a node. A side
effect is a target file that is created or updated as a side effect of
building other targets. For example, a Windows PDB file is created as
a side effect of building the .obj files for a static library, and
various log files are created updated as side effects of various TeX
commands. If a target is a side effect of multiple build commands,
scons will ensure that only one set of commands is executed at a time.
Consequently, you only need to use this method for side-effect targets
that are built as a result of multiple build commands.
Because multiple build commands may update the same side effect file,
by default the side_effect target is not automatically removed when
the target is removed by the -c option. (Note, however, that the
side_effect might be removed as part of cleaning the directory in
which it lives.) If you want to make sure the side_effect is cleaned
whenever a specific target is cleaned, you must specify this
explicitly with the Clean or env.Clean function.
http://scons.org/doc/production/HTML/scons-man.html

How do I write a SCons script with hard-to-predict dynamic sources?

I'm trying to set up a build system involving a code generator. The exact files generated are unknown until after the generator is run, but I'd like to be able to run further build steps by pattern matching (run some program on all files with some extension). Is this possible?
Some of the answers here involving code generation seem to assume that the output is known or a listing of generated files is created. This isn't impossible in my case, but I'd like to avoid it since it makes things more complicated.
https://bitbucket.org/scons/scons/wiki/DynamicSourceGenerator seems to indicate that it's possible to add additional targets during Builder actions, but while I could get the build to run and list the generated files, any build steps introduced don't run.
https://bitbucket.org/scons/scons/wiki/NonDeterministicDependencies uses Scanners to add build steps. I put a glob(...) in a scanner, and it succeeds in detecting the generated files, but the files are inexplicably deleted before it actually runs the dependent step.
Is this use case possible? And why is SCons deleting my generated files?
A toy example
source (the file referenced in SConscript)
An example generator, constructs 3 files (not easily known to the build system) and puts them in the argument folder
echo "echo 1" > $1/gen1.txt
echo "echo 2" > $1/gen2.txt
echo "echo 3" > $1/gen3.txt
SConstruct
Just sets up a variant_dir
SConscript('SConscript', variant_dir='build')
SConscript
The goal is for it to:
"Compile" the generator (in this toy example, just copies a file called 'source' and adds execute permissions
Run the "compiled" generator ('source' is a script that generates files)
Perform some operation on each of those generated files by extension. This example just runs the "compile" copy operation on them (for simplicity).
env = Environment()
env.Append(BUILDERS = {'ExampleCompiler' :
Builder(action=[Copy('$TARGET', '$SOURCE'),
Chmod('$TARGET', 0755)])})
generator = env.ExampleCompiler('generator', 'source')
env.Append(BUILDERS = {'GeneratorRun' :
Builder(action=[Mkdir('$TARGET'),
'$SOURCE $TARGET'])})
generated_dir = env.GeneratorRun(Dir('generated'), generator)
Everything's fine up to here, where all the targets are explicitly known to the build system ahead of time.
Attempting to use this block of code to glob over the generated files causes SCons to delete (!!) the generated files:
for generated in generated_dir[0].glob('*.txt'):
generated_run = env.ExampleCompiler(generated.abspath + '.sh', generated)
Attempting to use an action to update the build tree results in additional actions not being run:
def generated_scanner(target, source, env):
for generated in source[0].glob('*.txt'):
print "scanned " + generated.abspath
generated_target = env.ExampleCompiler(generated.abspath + '.sh', generated)
Alias('TopLevelAlias', generated_target)
env.Append(BUILDERS = {'GeneratedOperation' :
Builder(action=[generated_scanner])})
dummy = env.GeneratedOperation(generated_dir[0].File('#dummy'), generated_dir)
Alias('TopLevelAlias', dummy)
The Alias operations are suggested in above dynamic source generator guide, but don't seem to do anything. The prints do execute and indicate that the action gets run.

Running some build pattern on special file extensions is possible with SCons. For C/CPP files this is the preferred scheme, for example:
env = Environment()
env.Program('main', Glob('*.cpp'))
The main task of SCons, as a build system, is to do the minimum amount of work such that all your targets are up-to-date. This makes things complicated for the use case you've described above, because it's not clear how you can reach a "stable" situation where no generated files are added and all targets are built.
You're probably better off by using a simple Python script directly...I really don't see how using SCons (or any other build system for that matter) is mission-critical in this case.
Edit:
At some point you have to tell SCons about the created files (*.txt in your example above), and for tracking all dependencies properly, the list of *.txt files has to be complete. This the task of the Emitter within SCons, which is responsible for returning the list of resulting target and source files for a Builder call. Note, that these files don't have to exist physically during the "parse" phase of SCons. Please also have a look at my answer to Scons: create late targets , which goes into some more detail.
Once you have a proper Emitter in place (see also https://bitbucket.org/scons/scons/wiki/ToolsForFools , "Using Emitters") you should be able to use the Glob('*.txt') call, which will detect and track your created files automatically.
Finally, on our page "Talks and Slides" ( https://bitbucket.org/scons/scons/wiki/TalksAndSlides ) you can find my talk from the PyCon FR.2014, "Why SCons is Not Slow", which explains shortly how SCons works internally. This might be helpful in understanding this problem better and coming up with a full solution.

How does Scons compute the build signature?

I keep different versions of one project in different directories. (This does make sense in this project. Sadly.) As there are only minor differences between the versions, I hope I can speed all builds after the first one by using a common cache directory for all builds.
Unfortunately I had to realise that, when building an object file from the same sources in different directories, SCons 2.3.3 stores the result on different locations in the cache. (The location is equal to the build signature, I assume.) The same sources are recompiled for each and every directory. So why does SCons determine different build signatures although
the compile commands are identical and
the sources and the include files are the same (identical output of of the preprocessor phase, gcc -E ...)
I'm using the decider "MD5-timestamp"
Even the resulting object files are identical!
For a trivial example (helloworld from the SCons documentation) re-using the cache works. Though in the big project I'm working on, it does not. Maybe the "SCons build environment" influences the build signature, even if it does not have any effect on the compile command?
Are there any debug options that could help besides --cache-debug=-? Which method of SCons determines the build signature?
The folders look somewhat like this:
<basedir1>/
SConstruct
src/something.cpp …
include/header.hpp …
<basedir2>/
SConstruct
src/something.cpp …
include/header.hpp …
/SharedCache/
0/ 1/ 2/ … F/
I check out the project in both basedir1 and basedir2 and call scons --build-cache-dir=/SharedCache in both of them. (EDIT: --build-cache-dir is a custom option, implemented in the SConstruct file of this project. It maps to env.CacheDir('/SharedCache').
EDIT2: Before I realized this problem, I did some tests to evaluate the effects of using --cache-implicit or SCons 2.4.0.

This is the code of the method get_cachedir_bsig() from the file src/engine/SCons/Node/FS.py:
def get_cachedir_bsig(self):
"""
Return the signature for a cached file, including
its children.
It adds the path of the cached file to the cache signature,
because multiple targets built by the same action will all
have the same build signature, and we have to differentiate
them somehow.
"""
try:
return self.cachesig
except AttributeError:
pass
# Collect signatures for all children
children = self.children()
sigs = [n.get_cachedir_csig() for n in children]
# Append this node's signature...
sigs.append(self.get_contents_sig())
# ...and it's path
sigs.append(self.get_internal_path())
# Merge this all into a single signature
result = self.cachesig = SCons.Util.MD5collect(sigs)
return result
It shows how the path of the cached file is included into the "cache build signature", which explains the behaviour you see. For the sake of completeness, here is also the code of the get_cachedir_csig() method from the same FS.py file:
def get_cachedir_csig(self):
"""
Fetch a Node's content signature for purposes of computing
another Node's cachesig.
This is a wrapper around the normal get_csig() method that handles
the somewhat obscure case of using CacheDir with the -n option.
Any files that don't exist would normally be "built" by fetching
them from the cache, but the normal get_csig() method will try
to open up the local file, which doesn't exist because the -n
option meant we didn't actually pull the file from cachedir.
But since the file *does* actually exist in the cachedir, we
can use its contents for the csig.
"""
try:
return self.cachedir_csig
except AttributeError:
pass
cachedir, cachefile = self.get_build_env().get_CacheDir().cachepath(self)
if not self.exists() and cachefile and os.path.exists(cachefile):
self.cachedir_csig = SCons.Util.MD5filesignature(cachefile, \
SCons.Node.FS.File.md5_chunksize * 1024)
else:
self.cachedir_csig = self.get_csig()
return self.cachedir_csig
where the cache paths of the children are hashed into the final build signature.
EDIT: The "cache build signature" as computed above, is then used to build the "cache path". Like this, all files/targets can get mapped to a unique "cache path" by which they can get referenced and found in (= retrieved from) the cache. As the comments above explain, the relative path of each file (starting from the top-level folder of your SConstruct) is a part of this "cache path". So, if you have the same source/target (foo.c->foo.obj) in different directories, they will have different "cache paths" and get built independent of each other.
If you truly want to share sources between different projects, note how the CacheDir functionality is more intended for sharing the same sources between different developers, you may want to have a look at the Repository() method. It let's you mount (blend in) another source tree to your current project...

How to diff files/folders in Gradle?

I need to write a script in Gradle that takes as an input 2 folders.
Both folders contain <1000 files (mostly images) with reasonably similar structure.
The output should be a list of files that changed and what kind of difference it is (added file/deleted file/changed file).
Edit: Heres a sample script https://gist.github.com/igormukhin/71d780c4274336eeb297 . The only problem is that it compares by timestamp.

I have recently coded up something similar to what you are asking for: DirectoryDifferenceCollector; however, it actually compares the contents of the files (as a hash) and not the timestamp. I would be willing to update it to accept a configurable strategy if that would suit your needs, or you can just use the concepts involved.
Basically it scans both directories and determines the missing files in both A and B and then it also determines which files are common to both directories, but have different content.
The results are collected in a DirectoryDifference object with the respective file paths for each category.

I'd be tempted to use diff:
def process = 'diff x y'.execute()
You can then access the output of the command as text:
println process.err.text
println process.in.text
And get the exit status via:
int status = process.waitFor()
Many common operating systems will come with diff installed, but Windows probably does not.

save MATLAB code file along with results in one folder?

I'm processing a data set and running into a problem - although I xlswrite all the relevant output variables to a big Excel file that is timestamped, I don't save the code that actually generated that result. So if I try to recreate a certain set of results, I can't do it without relying on memory (which is obviously not a good plan). I'd like to know if there's a command(s) that will help me save the m-files used to generate the output Excel file, as well as the Excel file itself, in a folder I can name and timestamp so I don't have to do this manually.
In my perfect world I would run the master code file that calls 4 or 5 other function m-files, then all those m-files would be saved along with the Excel output to a folder names results_YYYYMMDDTIME. Does this functionality exist? I can't seem to find it.

There's no such functionality built in.
You could build a dependency tree of your main function by using depfun with mfilename.
depfun(mfilename()) will return a list of all functions/m-files that are called by the currently executing m-file.
This will include all files that come as MATLAB builtins, you might want to remove those (and only record the MATLAB version in your excel sheet).
As pseudocode:
% get all files:
dependencies = depfun(mfilename());
for all dependencies:
if not a matlab-builtin:
copyfile(dependency, your_folder)
As a "long term" solution you might want to check if using a version control system like subversion, mercurial (or one of many others) would be applicable in your case.
In larger projects this is preferred way to record the version of source code used to produce a certain result.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string