Writing a Script to match the architecture of system and software - linux

I am trying to write a script where it will cross check to things:
The architecture for which the setup file was intended (32 or 64 bit)
The Architecture of the system.
The second part is quite easy and can be figured out using commands like lscpu and then extracting that specific line using combination of grep and awk or sed. However the first part is proving out to be a complicated one. I tried using the file command but it has a very irregular output. Hence it becomes very difficult extracting a specific column from it. I also tried using objdump though traditionally not used for things like this. However as expected, due to its limitations, it does not recognize most of the file types.
The rest part of the script is dead simple where I would be comparing these values and proceeding with my intended tasks. I would like your help with the Point 1 mentioned above.

Related

Why Spreadsheet::XLSX parses dates inconsistently on different machines?

I created a Perl script reading information from an XLSX sheet. Since on one machine it worked well, and on another it did not, I included a short debug section:
$sheetdate = ($sheet -> {Cells} [0] [$sheet->{MaxCol}]) -> value();
print "value: $sheetdate\n";
$sheetdate = ($sheet -> {Cells} [0] [$sheet->{MaxCol}]) -> get_format();
print "getformat: $sheetdate\n";
On one machine it printed:
value: 2016-01-18
getformat: yyyy-mm-dd
While on the other:
value: 1-18-16
getformat: m-d-yy
Same script, same worksheet, different results. I believe that something in the environment makes the difference, but I do not know what exactly.
Any hints?
"Same script, same worksheet, different results. I believe that something in the environment makes the difference, but I don not know what exactly."
You sort-of indicate here yourself that you're not really seeking the solution to a perl or XLSX problem so much as some assistance with troubleshooting your environment.
Without access to the environment its difficult to offer a solution per se, but I can say this - you need to;
1) Re-arrange things so that you do get the same result from both environments;
2) Identify a list of differences between the original, problem environment and the one that now "works"; and
3) Modify one thing on the list at a time - moving towards the environment that works - checking each time until it becomes clear what the key variable (not in a programming sense) is.
With regards to (1), take a look at Strawberry Perl. Using Strawberry, its relatively easy to set up what some call Perl on a stick (see Portable ZIP edition) - a complete perl environment on a USB stick. Put your document on the same USB and then try the two environments - this time with absolute certainty of having the same environment. If different results persist, try booting from a "live environement" DVD (linux or widows as appropriate), and then using the USB.
Ultimately, I'd suggest there's something (such as a spreadsheet template ) at play that is different between the environments. You just need to go through a process of elimination to find out what it is.
With the benefit of hindsight, I think its worth revisiting this to produce a succinct answer for those who come across this problem in the future.
The original question was how could a perl script produce two different results when the excel data file fed into it is identical (which was confirmed with MD5 checksums). As programmers, our focus tends to be on the scripts we write and the data that goes into them. What slips to the back of the mind is the myriad of ways that perl itself can be installed and configured.
The three things that should assist in determining where the difference between two installs lie are;
(1) Use strawberry perl on a stick as described above to take the environment out of the equation and thereby (if the problem "disappears") confirm that the problem is something to do with the environment.
(2) Use Data::Dumper liberally throughout to find where the flow of execution "forks."
(3) Compare the output of perl -V (note capital V) to find out if there are differences in how the respective perls were built and configured.
The root cause of the problem was an outdated Spreadsheet::XLSX cpan module installed as RPM from the distribution repository. The RPM included version 0.13 of this module, in CPAN there was already version 0.15, and since version 0.14 the module's behaviour was changed in this particular respect. So once I replaced the pre-compiled module with the version downloaded directly from CPAN and compiled locally, the problem was solved.

Find duplicate Files

I used to use a program finddupe on Windows (XP) which checked for duplicate files and offered to replace by hardlinks.
This calculated a hash of the 1st 32K, only checking the balance on match. I have the source (for VC++6), but was wondering if there is a Linux/OSX equivalent before I try to port it, although I suspect it may be better to write a new program in a higher level language.
I've found fdupes to be helpful for me.
If you are looking to write your own quick script, I would suggest looping over files and using cmp as it allows you to easily stop comparison after the first mismatched byte.
There are many similar tools. See here
They may not be part of standard distribution.
I have used fslint before and found it to be sufficient for my needs.

Time virtualisation on linux

I'm attempting to test an application which has a heavy dependency on the time of day. I would like to have the ability to execute the program as if it was running in normal time (not accelerated) but on arbitrary date/time periods.
My first thought was to abstract the time retrieval function calls with my own library calls which would allow me to alter the behaviour for testing but I wondered whether it would be possible without adding conditional logic to my code base or building a test variant of the binary.
What I'm really looking for is some kind of localised time domain, is this possible with a container (like Docker) or using LD_PRELOAD to intercept the calls?
I also saw a patch that enabled time to be disconnected from the system time using unshare(COL_TIME) but it doesn't look like this got in.
It seems like a problem that must have be solved numerous times before, anyone willing to share their solution(s)?
Thanks
AJ
Whilst alternative solutions and tricks are great, I think you're severely overcomplicating a simple problem. It's completely common and acceptable to include certain command-line switches in a program for testing/evaluation purposes. I would simply include a command line switch like this that accepts an ISO timestamp:
./myprogram --debug-override-time=2014-01-01Z12:34:56
Then at startup, if set, subtract it from the current system time, and indeed make a local apptime() function which corrects the output of regular system for this, and call that everywhere in your code instead.
The big advantage of this is that anyone can reproduce your testing results, without a big readup on custom linux tricks, so also an external testing team or a future co-developer who's good at coding but not at runtime tricks. When (unit) testing, that's a major advantage to be able to just call your code with a simple switch and be able to test the results for equality to a sample set.
You don't even have to document it, lots of production tools in enterprise-grade products have hidden command line switches for this kind of behaviour that the 'general public' need not know about.
There are several ways to query the time on Linux. Read time(7); I know at least time(2), gettimeofday(2), clock_gettime(2).
So you could use LD_PRELOAD tricks to redefine each of these to e.g. substract from the seconds part (not the micro-second or nano-second part) a fixed amount of seconds, given e.g. by some environment variable. See this example as a starting point.

Verifying two different build architectures (one a re-write of the other) are functionally equivalent

I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)
Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).
I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.

Are there good reasons not to exploit '#!/bin/make -f' at the top of a makefile to give an executable makefile?

Mostly for my amusement, I created a makefile in my $HOME/bin directory called rebuild.mk, and made it executable, and the first lines of the file read:
#!/bin/make -f
#
# Comments on what the makefile is for
...
all: ${SCRIPTS} ${LINKS} ...
...
I can now type:
rebuild.mk
and this causes make to execute.
What are the reasons for not exploiting this on a permanent basis, other than this:
The makefile is tied to a single directory, so it really isn't appropriate in my main bin directory.
Has anyone ever seen the trick exploited before?
Collecting some comments, and providing a bit more background information.
Norman Ramsey reports that this technique is used in Debian; that is interesting to know. Thank you.
I agree that typing 'make' is more idiomatic.
However, the scenario (previously unstated) is that my $HOME/bin directory already has a cross-platform main makefile in it that is the primary maintenance tool for the 500+ commands in the directory.
However, on one particular machine (only), I wanted to add a makefile for building a special set of tools. So, those tools get a special makefile, which I called rebuild.mk for this question (it has another name on my machine).
I do get to save typing 'make -f rebuild.mk' by using 'rebuild.mk' instead.
Fixing the position of the make utility is problematic across platforms.
The #!/usr/bin/env make -f technique is likely to work, though I believe the official rules of engagement are that the line must be less than 32 characters and may only have one argument to the command.
#dF comments that the technique might prevent you passing arguments to make. That is not a problem on my Solaris machine, at any rate. The three different versions of 'make' I tested (Sun, GNU, mine) all got the extra command line arguments that I type, including options ('-u' on my home-brew version) and targets 'someprogram' and macros CC='cc' WFLAGS=-v (to use a different compiler and cancel the GCC warning flags which the Sun compiler does not understand).
I would not advocate this as a general technique.
As stated, it was mostly for my amusement. I may keep it for this particular job; it is most unlikely that I'd use it in distributed work. And if I did, I'd supply and apply a 'fixin' script to fix the pathname of the interpreter; indeed, I did that already on my machine. That script is a relic from the first edition of the Camel book ('Programming Perl' by Larry Wall).
One problem with this for generally distributable Makefiles is that the location of make is not always consistent across platforms. Also, some systems might require an alternate name like gmake.
Of course one can always run the appropriate command manually, but this sort of defeats the whole purpose of making the Makefile executable.
I've seen this trick used before in the debian/rules file that is part of every Debian package.
To address the problem of make not always being in the same place (on my system for example it's in /usr/bin), you could use
#!/usr/bin/env make -f
if you're on a UNIX-like system.
Another problem is that by using the Makefile this way you cannot override variables, by doing, for example make CFLAGS=....
"make" is shorter than "./Makefile", so I don't think you're buying anything.
The reason I would not do this is that typing "make" is more idiomatic to building Makefile based projects. Imagine if every project you built you had to search for the differently named makefile someone created instead of just typing "make && make install".
You could use a shell alias for this too.
We can look at this another way: is it a good idea to design a language whose interpreter looks for a fixed filename if you don't give it one? What if python looked for Pythonfile in the absence of a script name? ;)
You don't need such a mechanism in order to have a convention based around a known name. Example: Autoconf's ./configure script.

Resources