SCONS Timestamp mechanism

SCONS Timestamp mechanism - scons

I was looking at Scons source code but cant seem to pinpoint where it is calculating for timestamp (didnt have trouble finding MD5 calculation).
And the manual page just refers as timestamp and does not go in depth of what it actual is. Maybe it is obvious for some but I am still unclear what this exactly means.
Timestamp of what?
Is the following the way Scons use for timestamp consistency?
time.ctime(os.path.getmtime(file))
basically checking for when a file is modified?
And then compare this against what at run time?

If you have ever worked with Make, the concept should be familar. Basically it compares the modification time of the source with the target, and if the source is newer, it should rebuild the target. There is also some file signature information that SCons stores internally in the .sconsign.dblite file, that I dont believe can be accessed programatically.
As can be seen in the SCons Decider() function docs, the behaviour can be configured to be one of the following (copied from the SCons man page):
timestamp-newer (This is the behavior of the classic Make utility, and make can be used a synonym for timestamp-newer)
timestamp-match
MD5
MD5-timestamp

Related

Can SCons be used to construct targets from indeterminately named source files?

I have a directory with multiple source files of indeterminate name. The only thing I know is the file extension. I want to take each source file, and build a single target from each. The method I'm currently using is to determine the name of each source using a for loop:
targets = []
for file in listdir('.'):
if file.endswith('.xdm'):
targets += env.m4(source=file)
The advantage of doing it progrmatically like this is that the SConscript doesn't have to be maintained by the developers as they add new sources. The problem is that the targets are no longer cleaned because of something to do with dependencies that I don't entirely understand.
So my question is is there a more appropriate way to do this, using in-built SCons functionality, without relying on more traditional flow control, or should I just ensure that each of my sources is determined and list them individually in the SConscript?

Instead of fiddling with listdir I would simply use the Glob() method, as provided by SCons itself:
for file in Glob("*.xdm"):
env.m4(source=file)
This (like the example from your question) is a perfectly fine approach, since it uses the fact that SConscripts are actually Python scripts. The Glob() approach has the advantage of also finding *.xdm files that don't exist on the harddrive yet, but may get created as part of the build process later.
I wonder about the problems that you mentioned, regarding cleaning of the targets. The Q&A linked in your question above seems unrelated to me. If you experience actual "cleaning" problems with one of the approaches above, please post a separate question together with the full verbatim input and output. If it should turn out that this doesn't work out-of-the-box, I'd consider it to be a bug.

ELF, Build-ID, is there a utility to recompute it?

I came across this useful feature in ELF binaries -- Build ID. "It ... is (normally) the SHA1 hash over all code sections in the ELF image." One can read it with GNU utility:
$ readelf -n /bin/bash
...
Displaying notes found at file offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 54967822da027467f21e65a1eac7576dec7dd821
And I wonder if there is an easy way to recompute Build ID yourself? To check if it isn't corrupted etc.

So, I've got an answer from Mark. Since it is an up to date info, I post it here. But basically you guys are right. Indeed there is no tool for computing Build-ID, and the intentions of Build-ID are not (1) identification of the file contents, and not even (2) identification of the executable (code) part of it, but it is for (3) capturing "semantic meaning" of a build, which is the hard bit for formalization. (Numbers are for self-reference.)
Quote from the email:
-- "Is there a user tool recomputing the build-id from the file itself, to
check if it's not corrupted/compromised somehow etc?"
If you have time, maybe you could post an answer there?
Sorry, I don't have a stackoverflow account.
But the answer is: No, there is no such tool because the precise way a
build-id is calculated isn't specified. It just has to be universally
unique. Even the precise length of the build-id isn't specified. There
are various ways using different hashing algorithms a build-id could be
calculated to get a universally unique value. And not all data might
(still be) in the ELF file to recalculate it even if you knew how it was
created originally.
Apparently, the intentions of Build-ID changed
since the Fedora Feature page was written about
it.
And people's opinions diverge on what it is now.
Maybe in your answer you could include status of Build-ID and what it is
now as well?
I think things weren't very precisely formulated. If a tool changes the
build that creates the ELF file so that it isn't a "semantically
identical" binary anymore then it should get a new (recalculated)
build-id. But if a tool changes something about the file that still
results in a "semantically identical" binary then the build-id stays the
same.
What isn't precisely defined is what "semantically identical binary"
means. The intention is that it captures everything that a build was
made from. So if the source files used to generate a binary are
different then you expect different build-ids, even if the binary code
produced might happen to be the same.
This is why when calculating the build-id of a file through a hash
algorithm you use not just the (allocated) code sections, but also the
debuginfo sections (which will contain references to the source file
names).
But if you then for example strip the debuginfo out (and put it into a
separate file) then that doesn't change the build-id (the file was still
created from the same build).
This is also why, even if you knew the precise hashing algorithm used to
calculate the build-id, you might not be able to recalculate the
build-id. Because you might be missing some of the original data used in
the hashing algorithm to calculate the build-id.
Feel free to share this answer with others.
Cheers,
Mark
Also, for people interested in debuginfo (linux performance & tracing, anyone?), he mentioned a couple projects for managing them on Fedora:
https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo
https://fedoraproject.org/wiki/Changes/SubpackageAndSourceDebuginfo

The build ID is not a hash of the program, but rather a unique identifier for the build, and is to be considered just a "unique blob" — at least at some point it used to be defined as a hash of timestamp and absolute file path, but that's not a guarantee of stability either.

I wonder if there is an easy way to recompute Build ID yourself?
No, there isn't, by design.
The page you linked to itself links to the original description of what build-id is and what it's usable for. That pages says:
But I'd like to specify it explicitly as being a unique identifier good
only for matching, not any kind of checksum that can be verified against
the contents.
(There are external general means for content verification, and I don't
think debuginfo association needs to do that.)
Additional complications are: the linker can take any of:
--build-id
--build-id=sha1
--build-id=md5
--build-id=0xhexstring
So the build id is not necessarily an sha1 sum to begin with.

Using the GHC API to do a "dry run" of code compilation

I'm working on a fairly simple text-editor for Haskell, and I'd like to be able to highlight static errors in code when the user hits "check."
Is there a way to use the GHC-API to do a "dry-run" of compiling a haskell file without actually compiling it? I'd like to be able to take a string and do all the checks of normal compilation, but without the output. The GHC-API would be ideal because then I wouldn't have to parse command-line output from GHC to highlight errors and such.
In addition, is it possible to do this check on a string, instead of on a file? (If not, I can just write it to a temp file, which isn't terribly efficient, but would work).
If this is possible, could you provide or point me to an example how how to do this?
This question ask the same thing, but it is from three years ago, at which time the answer was "GHC-API is new and there isn't good documentation yet." So my hope is that the status has changed.
EDIT: the "dry-run" restriction is because I'm doing this in a web-based setting where compilation happens server side, so I'd like to avoid unnecessary disk reads/write every time the user hits "check". The executable would just get thrown away anyways, until they had a version ready to run.

Just to move this to an answer, this already exists as ghc-mod, here's the homepage. This already has frontends for Emacs, Sublime, and Vim so if you need examples of how to use it, there are plenty. In essence ghc-mod is just what you want, a wrapper around the GHC API designed for editors.

scons: Unnecessarily rebuilds files during the first time-stamp only build

I am doing a timestamp-only build to bulk convert image files. Many of the converted image files already exist, but I like to make sure that they are all checked through each time.
How come SCons requires a database file (.sconsign.dblite) that it uses for MD5 hash data when it's instructed (via env.Decider("timestamp-newer")) to only deal with timestamps? It shouldn't need to keep a database between builds for timestamps because all the information is associated with the files themselves.
If the dblite database doesn't exist SCons reconverts all the images regardless of whether their timestamps imply they need to be rebuilt or not. The title is an example message I get when the dblite database does not exist.
If anyone can explain this I'd really appreciate it. I love the functional programming with Python, but SCons itself is not quite doing it for me at the moment.

Using "timestamp-newer", SCons actually stores the timestamp info. You can see why here:
Using Time Stamps to Decide If a File Has Changed
Try using "timestamp-match" instead.

I finally got this sorted. Brady was right about how to use SCons, but I a few days ago I eventually worked out you can also control exactly what you want built by just controlling what build commands are issued in the first place. In my case I ignored any image files for which the target file already exists using os.path.exists().
Sounds simple, but it is a conceptual difference between SCons and make, because make does not save its state between builds in the way SCons does.

Yes, I'm trying to work out the same thing, but I'm doing bulk conversion of video files which takes several days if done unnecessarily. I've already done most of it.
So I want a way to tell SCons, "For files that exist now, store their existing timestamps/MD5s, and don't rebuild unless that changes in future."
Will report back if I find a way...

I think your question is really about why there's a .sconsign.dblite when you set the decider to just check timestamp.
One reason is that it allows SCons to keep track of the method used to produce each target. If that changes, even if the timestamp doesn't, it should rebuild the affected targets.
Have you tried building a single file, and then using the sconsign utility to examine the contents of the .sconsign.dblite file?

Verifying two different build architectures (one a re-write of the other) are functionally equivalent

I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)

Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).

I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string