configure and multiple threads

configure and multiple threads - multithreading

When I need to compile an application from sources (I'm talking in a linux environment) basically the procedure is the following:
download and extract sources
./configure [optional params]
make
make install
Usually I pass -j4 to make in order to use all the CPU resources and speed up (a lot!) the compilation process.
I'm wondering if there is something similar for configure which often takes a lot of time for execution. Of course I've already tried to pass the same option but it fails, and I find nothing related in configure --help.

No, configure scripts generally do not conventionally allow for distributed or parallelized execution.
Results are usually cached in configure.cache so you might be able to refactor for parallel execution without too much effort.
If you want to save on running multiple configuration jobs for different libraries where they may run the same tests multiple times, have them share the same cache file. See https://www.gnu.org/software/autoconf/manual/autoconf-2.65/html_node/Cache-Files.html

Related

Tool to manage multiple scripts run simultaneously

We have multiple scripts to run simultaneously on server, in order to perform some tasks. These scripts are set to run on a frequency via crons.
Using crons for this purpose as so many drawbacks. Like using crons we are unable to track if previous command completed then run it again otherwise wait for its completion. Also we are unable to get the errors occurred during script running and its output. This approach increases CPU load also.
So I need a tool where we can set these scripts, which executes only if previous one is completed and track output of those scripts as well. Firstly I tried to use ActiveMQ for this but I think this tool not suitable for this purpose.
Can someone please suggest me a tool for this requirement?

Using puppet to build from source

How can I use puppet to build from source without using multiple Exec commands?. Do we have modules for it on forge that I could use?

It's possible to use Puppet to build applications from source without using execs, possibly with a custom written type and provider. Otherwise, yes, it'd have to be a few different exec resources with onlyif, creates etc. statements to stop them running every time the agent ran.
Puppet's model of configuration management is known as a desired state model: you define the end state of the system and let the system. This is why exec's are generally avoided in Puppet: they don't fit a desired state model. It also makes things like updating the application, or dealing with unknowns like a partial failure of the compilation that creates a required file.
In my opinion, I would not recommend using configuration management to build applications from source at all. There are a few issue inherent with doing so (this is not just for Puppet, but most config management languages):
Slower runs, as running the compilation can be longer and detecting that it's complete is normally a slightly trickier tasks
Issues with half complete state or failure: if the compilation breaks halfway through it's both harder to detect and resolve
Making the compilation idempotent: You have to wrap the command in logic that detects if the installation has already been done. However, this is difficult, as things like the detection of a flag file or particular binary could occur even when the compilation ends in failure
Upgrading or changing: There's no easy way to upgrade or change the application. A package would be easier to do this with.
This sounds like something that would be better served by packaging, using tools such as FPM or just native package building tools such as rpmbuild.

How to be able to "move" all necessary libraries that a script requires when moving to a new machine

We work on scientific computing and regularly submit calculations to different computing clusters. For that we connect using linux shell and submitting jobs through SGE, Slurm, etc (it depends on the cluster). Our codes are composed of python and bash scripts and several binaries. Some of them depend on external libraries such as matplotlib. When we start to use a new cluster, it is a nightmare since we need to tell the admins all the libraries we need, and sometimes they can not install all of them, or they only have old versions that can not be upgraded. So we wonder what could we do here. I was wondering if we could somehow "pack" all libraries we need along with our codes. Do you think it is possible? Otherwise, how could we move to new clusters without the need for admins to install anything?

The key is to compile all the code you need by yourself, using the compiler/library/MPI toolchains installed by the admins of the clusters, so that
your software is compiled properly for the cluster hardware, and
you do not depend on the admin to install the software.
The following are very useful in this case:
Ansible, to upload/manage configuration files, rc files, set permissions, compile your binaries, etc. and deploy a new environment easily on new clusters
Easybuild to install your version of Python with all the needed dependencies, and install other scientific software thanks to the community supported build procedures
CDE to build a package with all dependencies for your binaries on your laptop and use it as-is on the clusters.
More specifically for Python, you can use
virtual envs to setup a consistent set of Python modules across all clusters, independently from the modules already installed; or
Anaconda or Canopy to use a Python scientific distribution
to have a consistent Python install across all clusters.

Don't get me wrong, but I think what you have to do so: stop behaving like amateurs.
Meaning: the integrity of your "system configuration" is one of the core assets of your "business". And you just told us that you are basically unable of easily re-producing your system configuration.
So, the real answer here can't be a recommendation to use this or that technology. The real answer is: you, and the other teams involved in running your operations need to come together and define a serious strategy how to fix this.
Maybe you then decide that the way to go is that your development team provides Docker buildfiles, so that your operations team can easily create images on new machines. Or you decide that you need to use something like ansible to enable centralized control over your complete environment.

That's what venv is for, it allows you to create a portable customized environment easily, with exactly what you need and nothing more.

I completely agree with https://stackoverflow.com/users/1531124/ghostcat
but here is the really bad answer that will cause you a lot of problems in near future!!!:
if you need some dynamic library and you are not planning to upgrade them in future, you can try copying all needed libs to a folder in your app and use an script to launch the app:
#!/bin/sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/lib/folder
./myAPP
but keep in mind that this is bad practice.

Create a chroot image, like here - click. Install everything you need and then you can just chroot into it on any machine.

I work on scientific clusters as well, and you are going to find that wherever you go.
I would only rely on the admins on installing the most basic stuff. That is:
- Software necessary to build your software or run the most basic stuff: compilers and most basic utilities (python, perl, binutils, autotools, cmake, etc.).
Software libraries that make use of I/O devices: MPI, file I/O libraries...
A queue system (they already have it most of the time).
Environment modules. This is not a must, but it really helps you get the job done, specially if you mess with different library versions or implementations (that's my case, for example).
From that point on, you can build and install on your own directories all the software you use most of the time.
This does not mean that you cannot ask an admin to install some libraries. If you feel that many people is going to benefit from that, then you should request its installation. In addition, you may need some specific version or some special features which are not used most of the time, but you really need them. A very good example is with BLAS libraries (basic lineal algebra subroutines):
You have lots of BLAS implementations available: the original BLAS, Intel MKL, OpenBLAS, ATLAS, cuBLAS
If that is not enough, the open source versions usually offer multiple configuration options: serial version, parallel version with PThreads, parallel version with OpenMP, parallel version with MPI...
In my particular case, most of the software that I felt was necessary for many users in the cluster ended up being installed by the admins without any problem (either me or other users requested it), but you also have to keep in mind that in a cluster there can be many users and a single person/team is not able to attend the specific requirements you need, specially if you are able to do so.

I think you want to containerize your application in some way. Two main options (because docker/rkt and similar things are way too heavyweight for your task if I understand it correctly) in my opinion are runc and snappy.
Runc relies on OCI runtime specification, you need to create an environment (that is very similar to chroot environment in that you need to copy everything you software uses in one directory) and then you'll be able to run your application with runc tool. Runc itself is just one binary, at the moment it requires root privileges to run (hello, cluster admins), but there are patches at least partly solving that, so if you build your own runc and there are no blocking things wrt root privilege requirements you may be able to run your application with no administration overhead at all.
Snappy is similar in that you need to prepare a snap package for your application, this time using snapcraft as an assistant tool. Snappy is probably a bit easier in creating an application image and IMO is certainly better for long-term support because it clearly separates your application from the data (kinda W^X, application image is a read-only squashfs file and application can only write to a limited set of directories). But at the moment it will require your cluster admins to install snapd and to perform some operations like snap installation that require root privileges. Still, it should be better than your current situation, because that's just one non-intrusive package to install.
If these tools don't fit for some reason, there is always an option to make something of your own. That won't be easy and there are many subtle details that can bite you when doing that, but it can be done, compile all of your dependencies and applications into some path, create wrapper scripts to set up PATH and LD_LIBRARY_PATH environment for your components and then bring that directory into the new cluster, run wrapper scripts instead of target binaries and that's it. It's similar to what XAMPP does, they have quite a number of integrated things packaged into one directory that works across many distributions.
update
Let's also add AppImage into the mix, theoretically it can be a savior for your case, as it specifically does not require root privileges. It's kinda inbetween Snappy and rolling your own, as you need to prepare your application directory yourself (snappy can manage some of dependencies with snapcraft when you just specify "I need this Ubuntu package"), add appropriate metadata and then it can be packaged into single executable.

Extract SAS Enterprise Guide into Unix Server runnable batch?

We have built a project in Enterprise Guide for the purpose of creating a easy understandable and maintainable code. The project contain a set of process flows which run should be done in specific order. This project we need to run on a Linux Server machine, where the SAS Metadata Server is running.
Basic idea is to extract this project into SAS code, which we would be able to run from command line in Linux as a batch job.
Question 1:
Is there any other way to schedule a batch job in Linux-hosted SAS Server? I have read about VBS scripting for scheduling/running batch jobs, but in order this to be done on Linux Server, a installation of WINE is required, which on a production machine which already runs a number of other important applications, is almost completely out of question.
Is there a way to specify a complete project export into SAS code, provided that I give the specific order of running process flows? I have tried out ordered list, which is able to make you a list of tasks to run in order (although there is no way to choose a whole process flow as a single task), but unfortunately, this ordered list itself is later not possible to be exported as a SAS code.
Current solution we do is the following:
We export each single process flow of the SAS EG project into SAS code, and then create another SAS code with %include lines to run all the extracted codes in order that we want. This is of course a possible solution, but definitely not the most elegant one.
Question 2:
Since I don't know how exactly the code is being exported afterwards, are there any dangers I should bear in mind with the solution I chose.
Is there any other, more elegant way?

You have a couple of options from what I'm familiar with, plus I suspect if Dom happens by he'll know more. These answers are based on EG 6.1, which is the current version (ships with 9.4); it's possible some of these things may not be true in earlier versions.
First, if you're running Enterprise Guide from Windows, you can schedule the job locally (on any Windows machine with Enterprise Guide). You're not scheduling the server directly, you schedule Windows to launch an EG process that connects to the server and does its magic. That's how I largely interact with scheduling (because I have a fairly 'light' scheduling need).
Second, from the blog post "Four Ways to Schedule SAS Tasks", options 3 and 4 may be helpful for you. The SAS Platform Suite is designed in part for scheduling, and the options using SAS Management Console to schedule via operating system tools, are both very helpful.
Third, you may want to look into SAS Stored Processes, which should be schedulable. A process flow can be converted into a stored process.
For your specific questions:
Question 1: When you export a process flow or a project, at least in 6.1 you have the option to change the order in which the programs are exported. It's manual, so it's probably not perfect, but it does give you that option. (The code seems to be by default in creation order, which is sub-optimal.) The project export does group process flows together, but you don't have the option of manipulating the order of process flows - you have to move each program around, which would be tedious. It also of course gives you less flexibility if you need to multiply run programs.
Question 2: As Stig Eide points out in comments, make sure your System Option LRECL is > 256 (the default) or you run some risk of code being cut off. In 9.2+ this is modifiable; just place LRECL=32767in your config.sas file.

How can I sandbox filesystem activity, particularly writes?

Gentoo has a feature in portage, that prevents and logs writes outside of the build and packaging directories.
Checkinstall is able to monitor writes, and package up all the generated files after completion.
Autotools have the DESTDIR macro that enables you to usually direct most of the filesystem activity to an alternate location.
How can I do this myself with the
safety of the Gentoo sandboxing
method?
Can I use SELinux, rlimit, or
some other resource limiting API?
What APIs are available do this from
C, Python?
Update0
The mechanism used will not require root privileges or any involved/persistent system modification. This rules out creating users and using chroot().
Please link to the documentation for APIs that you mention, for some reason they're exceptionally difficult to find.
Update1
This is to prevent accidents. I'm not worried about malicious code, only the poorly written variety.

The way Debian handles this sort of problem is to not run the installation code as root in the first place. Package build scripts are run as a normal user, and install scripts are run using fakeroot - this LD_PRELOAD library redirects permission-checking calls to make it look like the installer is actually running as root, so the resulting file ownership and permissions are right (ie, if you run /usr/bin/install from within the fakeroot environment, further stats from within the environment show proper root ownership), but in fact the installer is run as an ordinary user.
Builds are also, in some cases (primarily for development), done in chroots using eg pbuilder - this is likely easier on a binary distribution however, as each build using pbuilder reinstalls all dependencies beyond the base system, acting as a test that all necessary dependencies are specified (this is the primary reason for using a chroot; not for protection against accidental installs)

One approach is to virtualize a process, similar to how wine does it, and reinterpret file paths. That's rather heavy duty to implement though.
A more elegant approach is to use the chroot() system call which sets a subtree of the filesystem as a process's root directory. Create a virtual subtree, including /bin, /tmp, /usr, /etc as you want the process to see them, call chroot with the virtual tree, then exec the target executable. I can't recall if it is possible to have symbolic links within the tree reference files outside, but I don't think so. But certainly everything needed could be copied into the sandbox, and then when it is done, check for changes against the originals.

Maybe get the sandbox safety with regular user permissions? So the process running the show has specific access to specific directories.
chroot would be an option but I can't figure out how to track these tries to write outside the root.
Another idea would be along the lines of intercepting system calls. I don't know much about this but strace is a start, try running a program through it and check if you see something you like.
edit:
is using kernel modules an option? because you could replace the write system call with your own so you could prevent whatever you needed and also log it.

It sounds a bit like what you are describing is containers. Once you've got the container infrastructure set up, it's pretty cheap to create containers, and they're quite secure.

There are two methods to do this. One is to use LD_PRELOAD to hook library calls that result in syscalls, such as those in libc, and call dlsym/dlopen. This will not allow you to directly hook syscalls.
The second method, which allows hooking syscalls, is to run your executable under ptrace, which provides options to stop and examine syscalls when they occur. This can be set up programmatically to sandbox calls to restricted areas of the filesystem, among other things.
LD_PRELOAD can not intercept syscalls, but only libcalls?
Dynamic linker tricks: Using LD_PRELOAD to cheat, inject features and investigate programs
Write Yourself an Strace in 70 Lines of Code

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string