Multi-input, multi-output compilers with Shake

Multi-input, multi-output compilers with Shake - haskell

I'm experimenting with using Shake to build Java code, and am a bit stuck because of the unusual nature of the javac compiler. In general for each module of a large project, the compiler is invoked with all of the source files for that module as input, and produces all of the output files in one pass. Subsequently we typically take the .class files produced by the compiler and assemble them into a JAR (basically just a ZIP).
For example, a typical Java module project is arranged as follows:
a src directory that contains multiple .java files, some of them nested many levels deep in a tree.
a bin directory that contains the output from the compiler. Typically this output follows the same directory structure and filenames, with .class substituted for each .java file, but the mapping is not necessarily one-to-one: a single .java file can produce zero to many .class files!
The rules I would like to define in Shake are therefore as follows:
1) If any file under src is newer than any file under bin then erase all contents of bin and recreate with:
javac -d bin <recursive list of .java files under src>
I know this rule seems excessive, but without invoking the compiler we cannot know the extent of changes in output resulting from even a small change in a single input file.
2) if any file under bin is newer than module.jar then recreate module.jar with:
jar cf module.jar -C bin .
Many thanks!
PS Responses in the vein "just use Ant/Maven/Gradle/" will not be appreciated! I know those tools offer Java compilation out-of-the-box, but they are much harder to compose and aggregate. This is why I want to experiment with a Haskell/Shake-based tool.

Writing rules which produce multiple outputs whose names cannot be statically determined can be a bit tricky. The usual approach is to find an output whose name is statically known and always need that, or if none exists, create a fake file to use as the static output (as per ghc-make, the .result file). In your case you have module.jar as the ultimate output, so I would write:
"module.jar" *> \out -> do
javas <- getDirectoryFiles "" ["src//*.java"]
need javas
liftIO $ removeFiles "" ["bin//*"]
liftIO $ createDirectory "bin"
() <- cmd "javac -d bin" javas
classes <- getDirectoryFiles "" ["bin//*.class"]
need classes
cmd "jar cf" [out] "-C bin ."
There is no advantage to splitting it up into two rules, since you never depend on the .class files (and can't really, since they are unpredictable in name), and if any source file changes then you will always rebuild module.jar anyway. This rule has all the dependencies you mention, plus if you add/rename/delete any .java or .class file then it will automatically recompile, as the getDirectoryFiles call is tracked.

Related

Clang recursive include path

I have a problem when including dependency folder as this isn't looking for headers recursively.
FOLDER STRUCTURE:
- main.cpp
- dependency
- sub1
- header1.h
- sub2
- header2.h
- root-header.h
main.cpp
#include "root-header.h"
#include "header1.h"
#include "header2.h"
int main() {
}
Command:
clang main.cpp -I"dependency"
Error:
fatal error: 'header1.h' file not found
The command only detects header.h inside dependency folder to one level, how to make the clang to recursively lookup for all headers inside dependency folder. Is there any compiler arguments to be added?
Thanks

The ISO/IEC 9899:2011 standard in section §6.10.2 explains the expected behavior of clang and other compilers:
# include <h-char-sequence> new-line
searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
You can modify the defined places by adding additional with the -I option, but a compiler should not search sub-directories.
You can work around this limitation in the spec by using make to compile a list of additional -I locations to add to you clang command. This is covered in #DanBonachea answer.
Instead, I'd advise you to change the includes to be compliant to the specification:
#include "sub1/header1.h"
#include "sub2/header2.h"

The conventional solutions are one of the following:
1. Change the include directives in the source code
This solution compiles with clang++ -Idependency main.cpp but modifies #include directives to include headers by subdirectory, eg:
#include "sub1/header1.h"
#include "sub2/header2.h"
This is obviously a modification to the code, so usually only makes sense if sub1 and sub2 are meaningful within the larger structure of the software (e.g. package names that are always the same). Or...
2. Use shell tools to traverse the directory and build the include path
This solution uses find to inject subdirectories on the include path, eg:
$ clang++ `find ./dependency -type d -exec echo -I'{}' \;` main.cpp
which scans to identify the subdirectories and adds them to the preprocessor include path.
Discussion
Both of these approaches should work with few changes with basically any C/C++ compiler on UNIX (incl Linux, macOS, WSL, etc).
Note the second approach above will involve some additional filesystem churn on every compilation, which might be noticeable if the number of subdirectories is very large. To be fair this cost is fundamental to that use case, and even if built-in support for recursive include existed in the compiler frontend, it would still need to perform a similarly expensive recursive directory traversal on every compilation to find all the files.
3. Amortize directory traversal
However we can improve upon the second solution if we assume all the headers that will be included from this directory structure have unique names. This is a reasonable assumption, because otherwise the unqualified #include directives inside the source files will be ambiguous, leading to orthogonal problems. With this assumption in hand, we can create a cache to amortize the cost of the dependency directory traversal as follows:
$ mkdir allheaders ; cd allheaders
$ find ../dependency -type f -exec ln -s '{}' . \;
Then compilation simply becomes:
$ clang++ -Iallheaders main.cpp
Or, if you additionally want to support a mix of option 1 and option 3 #include directives, then:
$ clang++ -Idependency -Iallheaders main.cpp
This approach could greatly accelerate compilation, because the preprocessor only needs to open one user directory and open the files by basename. The fact that the directory may contain a large number of headers (with some fraction potentially unused) should not significantly degrade performance, thanks to how filesystems work.
If we further assume the file names in the dependency directory change infrequently or never, then we only need to execute the directory traversal step once, and can amortize that cost against repeated compilation using the allheaders cache directory.

SConstruct 101—moving on from Makefiles

Like
make,
scons has a large number of predefined variables and rules. (Try scons | wc on an SConstruct containing env = Environment(); print(env.Dump()) to see how extended the set is.)
But suppose we aren't after the wizardry of presets but rather want to do something a lot more primitive—simulating launching a few instructions from the (bash, etc) command line?
Also suppose we're quite happy with the default Decider('MD5'). What is the translation of the one-souce-one-target:
out/turquoise.xyz: out/chartreuse.xyz
chartreuse_to_turquoise $< $#
of the two-source-one-target:
out/purple.xyz: out/lilac.xyz out/salmon.xyz
gen_purple $< $#
and of:
run_this:
python prog.py
which we would run on-demand by typing make run_this?
What does the SConstruct for these elementary constructs look like?

All the answers you're looking for are in the users guide (and manpage)
Firstly, assuming you don't want to scan the input files to add included files specified in the input files, you can use Commmand()
(See info here: https://scons.org/doc/production/HTML/scons-user.html#chap-builders-commands)
Then you'll want an alias to specify an a non file command line target
(See here:https://scons.org/doc/production/HTML/scons-user.html#chap-alias)
Putting those two together yields
env=Environment()
# one source, one target
env.Command('out/turquoise.xyz', 'out/chartreuse.xyz', 'chartreuse_to_turquoise $SOURCE $TARGET')
# Two source, one target
env.Command('out/purple.xyz',['out/lilac.xyz','out/salmon.xyz'], 'gen_purple $SOURCES $TARGET')
# And your .phony make target which is actually not great for reproducibility and determining when it should be rerun, because you do not specify any sources or targets
env.Alias('run_this','python prog.py')
Note: SCons doesn't NOT propagate your shell environment variables. So if you depend on (for example) a non system path in your PATH, you'll need to explicitly specify that in env['ENV']['PATH'] for example. For more details take a read through the users guide, manpage and FAQ.
https://scons.org/doc/production/HTML/scons-user.html
https://scons.org/doc/production/HTML/scons-man.html
https://scons.org/faq.html
And you can reach the community directly via our discord server, IRC channel, or users mailing list

How to exclude files without an extension in Sublime Text 3?

I know there is an easy way to do it for known extensions, adding it in excluded patterns in preferences like "*.jpg", but how can I do it for the binary files without any extension at all?
In example, my c compiled files are named with just "name", not "name.o" etc, so is there any trick to exclude them?

To exclude a file with no extension then you must manually add the exact filename for each file that you want excluded.
Using your example, to exclude a file which is called name then add "name" to your file_exclude_patterns list like this:
"file_exclude_patterns": ["*.pyc", "*.pyo", "*.exe", ..., "name"],
Since you mention C compiled files then to avoid having to do this regularly with all your compiled executable files you can do one of several things or some combination or variation of them.
1) Consistently use the same executable file name, for example run, regardless of what you are compiling.
gcc example.c -o run
"file_exclude_patterns": ["*.pyc", "*.pyo", "*.exe", ..., "run"],
2) Choose a consistent prefix for the executable file names, for example run_.
gcc example.c -o run_example
gcc program.c -o run_program
"file_exclude_patterns": ["*.pyc", "*.pyo", "*.exe", ..., "run_*"],
3) Choose a file extension for your executables and use that consistently.
gcc example.c -o example.out
"file_exclude_patterns": ["*.pyc", "*.pyo", "*.exe", ..., "*.out"],

Sorry to update an old question. As of today, we could instead specify "file_include_patterns": ["*.*"] in a project preference to exclude files without an extension from the sidebar.
The "file_include_patterns" consists of an array of glob strings, referred to by ST as file patterns. However, the file pattern of ST only supports two matching operators, * and ?. According to the document, File patterns specified under "file_include_patterns" are
Patterns of files to include from the folder. Anything not matching these patterns will be excluded. This is checked before "file_exclude_patterns".
One could further set the "index_include_patterns": ["*.*"] to prevent ST from indexing symbols from files without an extension. Differences between "file_include_patterns" and "index_exclude_patterns" can be found in #OdatNurd's answer.
An example project configuration file my-project.sublime-project that prevents ST from showing files without an extension in the side bar or indexing symbols from them may look like
{
"folders":
[
{
"file_include_patterns": ["*.*"],
"index_include_patterns": ["*.*"],
"path": "/path/to/your/my-project-folder"
}
]
}
where "/path/to/your/my-project-folder" needs to be set accordingly.
This setting prevents ST from showing or index files without an extension under "/path/to/your/my-project-folder" or its sub-folders.

CMake recursively add all source files inside all subdirectories of a directory to the executable?

I have a pretty big file structure of a project which I need to convert into a multiplatform cmake project. Now it seams that cmake requires ever single cpp file be added individually to the executable. But is there a script that automates this? That snoopes through the file structure and just adds every source file automatically? Since the project will probably get a lot more source files and I probably wont be able to manually add every single one.

You could use execute_process() with a cmake -P script that uses globbing to recursively scan for source files which writes to an included file in your CMakeLists.txt i.e. something like:
"CMakeLists.txt":
execute_process(COMMAND ${CMAKE_COMMAND}
-D "RDIR=${CMAKE_CURRENT_SOURCE_DIR}"
-P "scansources.cmake"
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}")
include("sources.cmake")
add_executable(myexe ${sources})
"scansources.cmake" (generates "sources.cmake"):
file(GLOB_RECURSE sourcelist
*.c
*.cc
*.cpp
*.cxx)
string(REGEX REPLACE "${RDIR}/" "" relative_sources "${sourcelist}")
string(REPLACE ";" "\n" sources_string "${relative_sources}")
set(sources_string "set(sources\n${sources_string})")
file(WRITE sources.cmake "${sources_string}")
The reason why this works is because execute_process() occurs at configure time.
You could, of course, generate sources.cmake via some other tool or IDE then you wouldn't need scansources.cmake or execute_process().

what triggers scons to build files when I have a custom builder?

I'm going nuts trying to control when files are built in scons. I have a very simple example build tree (see below), with a Poem builder that just takes a .txt file and converts it to lower case in a corresponding .eectxt file.
In my SConstruct and SConscript files, I declare dependencies of 3 .txt files.
But I can't figure out what's putting these into the default build!
sconstest/
SConstruct
tiger.txt
src/
SConscript
hope.txt
jabberwocky.txt
where the *.txt files are poems and my SConstruct and SConscript look like this:
SConstruct:
env = Environment();
def eecummings(target, source, env):
if (len(target) == 1 and len(source) == 1):
with open(str(source[0]), 'r') as fin:
with open(str(target[0]), 'w') as fout:
for line in fin:
fout.write(line.lower());
return None
env['BUILDERS']['Poem'] = Builder(action=eecummings, suffix='.eectxt', src_suffix='.txt');
Export('env');
poems = SConscript('src/SConscript');
tigerPoem = env.Poem('tiger.txt');
src/SConscript:
Import('env');
input = ['jabberwocky.txt', 'hope.txt'];
output = [env.Poem(x) for x in input];
Return('output');
What I want to do is to declare the dependency of the .eectxt files from the corresponding .txt files, but not cause them to be built unless I explicitly put them into the Default() build in the SConstruct file, or I request them explicitly at the command line.
How can I do this?

By default, a directory depends on all files and/or targets which reside in it.
So running:
scons
Will then build all targets under the current directory.

I figured out how to do what I want, but I still don't understand why I need to do it this way. Acceptance to the first decent answer that explains it.
Here's what works, if I add the following to the root SConstruct file:
env.Ignore('.', tigerPoem);
env.Ignore('src', poems);
env.Alias('poems', [tigerPoem]+poems);
This ignores the 3 poems from the default target, and then adds them as targets aliased to "poems", so if I run scons it builds nothing, but if I run scons poems it builds the files.
Why does this work? Why does calling env.Poem(...) add something to the default targets?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string