Bash: Unexpected parallel behavior when reading arguments from file using xargs

Bash: Unexpected parallel behavior when reading arguments from file using xargs - linux

Previous
This is a follow-up to this question.
Specs
My system is a dedicated server running Ubuntu Desktop, Release 12.04 (precise) 64-bit, 3.14.32-xxxx-std-ipv6-64. Neither release or kernel can be upgraded, but I can install any package.
Problem
The problem discribed in the question above seems to be solved, however this doesn't work for me. I've installed the latest lftp and parallel packages and they seem to work fine for themselves.
Running lftp works fine.
Running ./job.sh ftp.microsoft.com works fine, but I needed to chmod -x the script
Running sed 's/|.*$//' end_unique.txt | xargs parallel -j20 ./job.sh ::: does not work and produces bash errors in the form of /bin/bash: <server>: command not found.
To simplify things, I cleaned the input file end_unique.txt, now it has the following format for each line:
<server>
Each line ends in a CRLF, because it is imported from a windows server.
Edit 1:
This is the job.sh script:
#/bin/sh
server="$1"
lftp -e "find .; exit" "$server" >"$server-files.txt"
Edit 2:
I took the file and ran it against fromdos. Now it should be standard unix format, one server per line. Keep in mind that the server in the file can vary in format:
ftp.server.com
www.server.com
server.com
123.456.789.190
etc. All of those servers are ftp servers, accessible by ftp://<serverfromfile>/.

With :::, parallel expects the list of arguments it needs to complete the commands it's going to run to appear on the command line, as in
parallel -j20 ./job.sh ::: server1 server2 server3
Without ::: it reads the arguments from stdin, which serves us better in this case. You can simply say
parallel -j20 ./job.sh < end_unique.txt
Addendum: Things that can go wrong
Make certain two things:
That you are using GNU parallel and not another version (such as the one from moreutils), because only (as far as I'm aware) the GNU version supports reading an argument list from stdin, and
That GNU parallel is not configured to disable the GNU extensions. It turned out, after a lengthy discussion in the comments, that they are disabled by default on Ubuntu 12.04, so it is not inconceivable that this sort of thing might be found elsewhere (particularly downstream from Ubuntu). Such a configuration can hide in
The environment variable $PARALLEL,
/etc/parallel/config, or
~/.parallel/config
If the GNU version of parallel is not available to you, and if your argument list is not too long for the shell and none of the arguments in it contain whitespaces, the same thing with the moreutils parallel is
parallel -j20 job.sh -- $(cat end_unique.txt)
This did not work for OP because the file contained more servers than the shell was willing to put into a command line, but it might work for others with similar problems.

Related

Writing to multiple remote terminals over SSH

Let's say I am SSH'ing to multiple remote machines. I'd like to send commands to all the machines through one interface. I was thinking of opening a named pipe whose output would be piped to each machine I SSH into. That way if I echo "ls -l" > namedpipe, then the command would run on each machine. I tried this, but it didn't work. Any suggestions on how can I have one terminal from which I could interact with all the remote machines?

GNU Parallel is the way to go. There are lots of examples on SO and elsewhere, also using named pipes when needed. Other tools are mentioned and compared in the parallel manpage.
As to your example, what you want can be done as simply as
parallel ssh {} "ls -l" ::: user1#machine1 user2#machine2 ...
Some linux distributions come with a configuration file (usually /etc/parallel/config ) where the option --tollef is set by default. If this is your case and you don’t want to change, you must use -- instead of ::: in the first example above, or, alternatively, use the --gnu option to override --tollef.
Equivalently, if you create a file called remotelist containing
user1#machine1
user2#machine2
you can issue:
parallel -a remotelist ssh {} "ls -l"
or, as noted by a comment below,
parallel --nonall --slf remotelist ls -l
the --slf option (short for --sshloginfile) allows stuffing more information in the remotelist file: comments, number of processors to use on each remote host, and the like.
You might also consider the --tag option, which prepends each output line with the name of the host it originates from.

There are plenty of tools available which could help you do that. Some of them it's worth checking at are:
pssh
Ansible Ad-Hoc module
They both work over SSH.
If you still want to use a custom script with a SSH client, you could do something like:
while read i
do
ssh user#$i <command to execute> <arg>
done < server.list

Execution error in a makefile

This is a reduced example of a makefile which illustrates my problem:
exec:
time (ls > ls.txt; echo $$? > code) 2> time.txt
make exec runs fine under one Linux installation:
Linux-2.6.32-642.4.2.el6.x86_64-x86_64-with-centos-6.8-Final
but it fails under my Ubuntu installation:
Linux-4.4.0-64-generic-x86_64-with-Ubuntu-16.04-xenial
and produces the message:
/bin/sh: 1: Syntax error: word unexpected (expecting ")")
No problems if I run the command time directly from the terminal.
Are there different versions of the command in different Linux installations? I need the version which allows a sequence of commands.

Make always invokes /bin/sh to run the recipe. On some systems, /bin/sh is an alias for bash which has a lot of extra extensions to the standard POSIX shell (sh). On other systems (like Ubuntu), /bin/sh is an alias for dash which is a smaller, simpler, closer to plain POSIX shell.
Bash has a built-in time operation which accepts an entire pipeline and shows the time taken for it (run help time at a bash shell command prompt to see documentation). Other shells like dash don't have a built-in time, so when you run it you get the program /usr/bin/time; run man time to see documentation. As a separate program it of course cannot time an entire pipeline (because a pipeline is a feature of the shell); it can only time one individual command.
You have various options:
You can force your makefile to always use bash as its shell by adding:
SHELL := /bin/bash
to it. I recommend adding a comment there as well describing why bash specifically is needed.
Or you can modify your rule to work in a portable way by making the shell invocation explicit so that time only has one command to invoke:
exec:
time /bin/sh -c 'ls > ls.txt; echo $$? > code' 2>/time.txt

Put a semicolon in front of "time". As is, make is trying to parse your command as a list of dependencies.

The only suggestion that worked is to force bash in my makefile:
SHELL := /bin/bash
I checked: on my Ubuntu machine, /bin/sh is really /bin/dash whereas on the CentOS machine it is /bin/bash!
Thanks!

Dry-run a potentially dangerous script?

A predecessor of mine installed a crappy piece of software on an old machine (running Linux) which I've inherited. Said crappy piece of software installed flotsam all over the place, and also is sufficiently bloated that I want it off ASAP -- it no longer has any functional purpose since we've moved on to better software.
Vendor provided an uninstall script. Not trusting the crappy piece of software, I opened the uninstall script in an editor (a 200+ line Bash monster), and it starts off something like this:
SWROOT=`cat /etc/vendor/path.conf`
...
rm -rf $SWROOT/bin
...
It turns out that /etc/vendor/path.conf is missing. Don't know why, don't know how, but it is. If I had run this lovely little script, it would have deleted the /bin folder, which would have had rather amusing implications. Of course this script required root to run!
I've dealt with this issue by just manually running all the install commands (guh) where sensible. This kind of sucked because I had to interpolate all the commands manually. In general, is there some sort of way I can "dry run" a script to have it dump out all the commands it would execute, without it actually executing them?

bash does not offer dry-run functionality (and neither do ksh, zsh, or any other shell I know).
It seems to me that offering such a feature in a shell would be next to impossible: state changes would have to be simulated and any command invoked - whether built in or external - would have to be aware of these simulations.
The closest thing that bash, ksh, and zsh offer is the ability to syntax-check a script without executing it, via option -n:
bash -n someScript # syntax-check a script, without executing it.
If there are no syntax errors, there will be no output, and the exit code will be 0.
If there are syntax errors, analysis will stop at the first error, an error message including the line number is written to stderr, and the exit code will be:
2 in bash
3 in ksh
1 in zsh
Separately, bash, ksh, and zsh offer debugging options:
-v to print each raw source code line[1]
to stderr before it is executed.
-x to print each expanded simple command to stderr before it is executed (env. var. PS4 allows tweaking the output format).
Combining -n with -v and/or -x offers little benefit:
With -n specified, -x has no effect at all, because nothing is being executed.
With -n specified, -v will effectively simply print the source code.
If there is a syntax error, there may be benefit in the source code getting print up to the point where the error occurs; keep in mind, though that the error message produced by
-n always includes the offending line number.
[1] Typically, it is individual lines that are printed, but the true unit is however many lines a given command - which may be a compound command such as while or a command list (such as a pipeline) - spans.

You could try running the script under Kornshell. When you execute a script with ksh -D, it reads the commands and checks them for syntax, but doesn't execute them. Combine that with set -xv, and you'll print out the commands that will be executed.
You can also use set -n for the same effect. Kornshell and BASH are fairly compatible with each other. If it's a pure Bourne shell script, both Kornshell and BASH will execute it pretty much the same.
You can also run ksh -u which will cause unset shell variables to cause the script to fail. However, that wouldn't have caught the catless cat of a nonexistent file. In that case, the shell variable was set. It was set to null.
Of course, you could run the script under a restricted shell too, but that's probably not going to uninstall the package.
That's the best you can probably do.

Multiple nested echo statements piped to command kernel limitation

I've got a straight forward bash script generated with fwbuilder that nests several echo statements and pipes them through to iptables-restore.
We compile this way instead of just having multiple "iptables -A xxx" lines since it compiles and deploys much quicker and it also doesn't drop existing connections.
The problem is we seem to have hit the limit of allowed multiple redirects (~23'850 lines don't work, ~23'600 lines do).
Run it on kernel 2.6.18 (CentOS 5.x) and it breaks, run it on 2.6.32 (6.x) and it works like a charm.
Script essentially looks like this, comes out as just one long line piped to the command:
(echo "1"; echo "2"; echo "3"; ... ; echo "25000") | /do/anything
So I guess the question is, is there an easy way to increase this limit without recompiling the kernel? I'd imagine it's some sort of stdin character limitation of piping. Or do I have to do an OS upgrade?
edit: Oh and would also like to add that when running on the older kernel, no errors are shown, but a segfault shows in dmesg.

The reason that you're not observing the problem on 2.6.32 and observing it on 2.6.18 is that starting with kernel 2.6.23 the ARG_MAX limitation has been removed. This is the commit for the change.
In order to find some ways to circumvent the limit, see ARG_MAX.

Can you use a here-doc instead?
cat <<EOF | /do/anything
1
2
3
...
25000
EOF

Finding if 'which' command is available on a System through BASH

While writing BASH scripts, I generally use the which command of a Linux machine (where Linux Machine refers to Desktop based Linux OS like Ubuntu, Fedora, OpenSUSE) for finding path or availability of other binaries. I understand that which can search for binaries (commands) which are present in the PATH variable set.
Now, I am unable to understand how to proceed in case the which command itself is not present on that machine.
My intention is to create a shell script (BASH) which can be run on a machine and in case the environment is not adequate (like some command being used in script is missing), it should be able to exit gracefully.
Does any one has any suggestions in this regard. I understand there can be ways like using locate or find etc - but again, what if even they are not available. Another option which I already know is that I look for existence of a which binary on standard path like /usr/bin/ or /bin/ or /usr/local/bin/. Is there any other possibility as well?
Thanks in advance.

type which
type is a bash built-in command, so it's always available in bash. See man bash for details on it.
Note, that this will also recognize aliases:
$ alias la='ls -l -a'
$ type la
la is aliased to 'ls -l -a'

(More of a comment because Boldewyn answered perfectly, but it is another take on the question that may be of interest to some.)
If you are worried that someone may have messed with your bash installation and somehow removed which, then I suppose in theory, when you actually invoked the command you would get an exit code of 127.
Consider
$ sdgsdg
-bash: sdgsdg: command not found
$ echo $?
127
Exit codes in bash: http://tldp.org/LDP/abs/html/exitcodes.html
Of course, if someone removed which, then I wouldn't trust the exit codes, either.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string