Run ssh in shell script in parallel and set remote variables - linux

I'm writing a script to read from a input file, which contains ~1000 lines of host info. The script ssh to each host, cd to the remote hosts log directory and cat the latest daily log file. Then I redirect the cat log file locally to do some pattern matching and statistics.
The simplified structure of my program is a while loop looks like this:
while read host
do
ssh -n name#$host "cd TO LOG DIR AND cat THE LATEST LOGFILE" | matchPattern
done << EOA
$(awk -F, '{print &7}' $FILEIN)
EOA
where matchPattern is a function to match pattern and do statistics.
Right now I got 2 questions for this:
1) How to find the latest daily log file remotely? The latest log file name matches xxxx2012-05-02.log and is newest created, is it possible to do ls remotely and find the file matching the xxxx2012-05-02.log file name?(I can do this locally but get jammed when appending it to ssh command) Another way I could come up with is to do
cat 'ls -t | head -1' or
cat $(ls -t | head -1)
However if I append this to ssh, it will list my local newest created file name, can we set this to a remote variable so that cat will find the correct file?
2) As there are nearly 1000 hosts, I'm wondering can I do this in parallel (like to do 20 ssh at a time and do the next 20 after the first 20 finishes), appending & to each ssh seems not suffice to accomplish it.
Any ideas would be greatly appreciated!
Follow up:
Hi everyone, I finally find a crappy way do solve the first problem by doing this:
ssh -n name#$host "cd $logDir; cat *$logName" | matchPattern
Where $logName is "today's date.log"(2012-05-02.log). The problem is that I can only use local variables within the double quotes. Since my log file ends with 2012-05-02.log, and there is no other files ends with this suffix, I just do a blindly cat *2012-05-02.log on remote machine and it will cat the desired file for me.

For your first question,
ssh -n name#$host 'cat $(ls -t /path/to/log/dir/*.log | head -n 1)'
should work. Note single quotes around the remote command.
For your second question, wrap all the ssh | matchPattern | analyse stuff into its own function, then iterate over it by
outstanding=0
while read host
do
sshMatchPatternStuff &
outstanding=$((outstanding + 1))
if [ $outstanding -ge 20 ] ; then
wait
outstanding=$((outstanding - 1))
fi
done << EOA
$(awk -F, '{print &7}' $FILEIN)
EOA
while [ $outstanding -gt 0 ] ; do
wait
outstanding=$((outstanding - 1))
done
(I assume you're using bash.)
It may be better to separate the ssh | matchPattern | analyse stuff into its own script, and then use a parallel variant of xargs to call it.

for your second question, take a look at parallel distributed shell:
http://sourceforge.net/projects/pdsh/

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
parallel -j0 --nonall --slf <(awk -F, '{print $7}' servers.txt) 'cd logdir; cat `ls -t | head -1` | grep pattern'
This way you get the matching done on the remote server. If you prefer to transfer the full log file and do the matching locally, simply move the grep outside:
parallel -j0 --nonall --slf <(awk -F, '{print $7}' servers.txt) 'cd logdir; cat `ls -t | head -1`' | grep pattern
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Related

echo text to multiple files in bash script

I am working on a bash script that uses pssh to run external commands, then join the output of the commands with the IP of each server. pssh has an option -o that writes a file for each server into a specified directory, but if the commands do not run, you just have an empty file. What I am having issues with is updating these empty files with something like "Server Unreachable" so that I know there was a connection issue reaching the server and to not cause problems with the rest of the script.
Here is what I have so far:
#!/bin/bash
file="/home/user/tools/test-host"
now=$(date +"%F")
folder="./cnxhwinfo-$now/"
empty="$(find ./cnxhwinfo-$now/ -maxdepth 1 -type f -name '*' -size 0 -printf '%f%2d')"
command="echo \$(uptime | awk -F'( |,|:)+' '{d=h=m=0; if (\$7==\"min\") m=\$6; else {if (\$7~/^day/) {d=\$6;h=\$8;m=\$9} else {h=\$6;m=\$7}}} {print d+0,\"days\",h+0,\"hours\",m+0,\"minutes\"}'), \$(hostname | awk '{print \$1}'), \$(sudo awk -F '=' 'FNR == 2 {print \$2}' /etc/connex-release/version.txt), \$(lscpu | awk -F: 'BEGIN{ORS=\", \";} NR==4 || NR==6 || NR==15 {print \$2}' | sed 's/ *//g') \$(free -k | awk '/Mem:/{print \$2}'), \$(df -Ph | awk '/var_lib/||/root/ {print \$2,\",\"\$5,\",\"}')"
pssh -h $file -l user -t 10 -i -o /home/user/tools/cnxhwinfo-$now -x -tt $command
echo "Server Unreachable" | tee "./cnxhwinfo-$now/$empty"
ls ./cnxhwinfo-$now >> ./cnx-data-$now
cat ./cnxhwinfo-$now/* >> ./cnx-list-$now
paste -d, ./cnx-data-$now ./cnx-list-$now >>./cnx-data-"$(date +"%F").csv"
I was trying to use find to locate the empty files and write "Server" unavailable using tee with this:
echo "Server Unreachable" | tee "./cnxhwinfo-$now/$empty"
if the folder specified doesn't already exist i get this error:
tee: ./cnxhwinfo-2019-09-03/: Is a directory
And if it does exist (ie, i run the script again), it instead creates a file named after the IP addresses returned by the find command, like this:
192.168.1.2 192.168.1.3 192.168.1.4 1
I've also tried:
echo "Server Unreachable" | tee <(./cnxhwinfo-$now/$empty)
The find command outputs the IP addresses on a single line with a space in between each one, so I thought that would be fine for tee to use, but I feel like I am either running into syntax issues, or am going about this the wrong way. I have another version of this same script that uses regular ssh and works great, just much slower than using pssh.
empty should be an array, assuming none of the file names will contain any whitespace in their names.
readarray -t empty < <(find ...)
echo "Server unreachable" | (cd ./cnxhwinfo-$now/; tee "${empty[#]}" > /dev/null)
Otherwise, you are building a single file name by concatenating the empty file names.

Testing active ssh keys on the local network

I am trying currently to achieve a bash script that will validate if SSH keys on a server are still linked to known hosts that are active on the local area network. You can find below the beginning of my bash script to achieve this:
#!/bin/bash
# LAN SSH KEYS DISCOVERY SCRIPT
# TRYING TO FIND THOSE SSH KEYS NOW
cat /etc/passwd | grep /bin/bash > bash_users
cat bash_users | cut -d ":" -f 6 > cutted.bash_users_home_dir
for bash_users in $(cat cutted.bash_users_home_dir)
do
ls -al $bash_users/.ssh/*id_* >> ssh-keys.txt
done
# DISCOVERING THE KNOWN_HOSTS NOW
for known_hosts in $(cat cutted.bash_users_home_dir)
do
cat $bash_users/.ssh/known_hosts | awk '{print $1}' | sort -u >>
hosts_known.txt
sleep 2
done
hosts_known=$(wc -l hosts_known.txt)
echo "We have $hosts_known known hosts that could be still active via SSH
keys"
# TIME TO TEST WHICH SSH servers are still active with the SSH keys
# AND THIS IS WHERE I AM FROZEN...
# Would love to have bash script that could
# ssh -l $users_that_have_/bin/bash -i $ssh_keys $ssh_servers
# Would also be very nice if it could save active
# SSH servers with the valid keys in output.txt in the format
# username:local-IP:/path/to/SSH_key
Please feel very comfortable to edit/modify the bash script above if it can serve better the goals described.
Any help would be very appreciated,
Thanks
The following works cool:
</etc/passwd \
grep /bin/bash |
cut -d: -f6 |
sudo xargs -i -- sh -c '
[ -e "$1" ] && cat "$1"
' -- {}/.ssh/known_hosts |
cut -d' ' -f1 |
tr ',' '\n' |
sed '
/^\[/{
s/\[\(.*\)\]:\(.*\)/\1 \2/;
t;
};
s/$/ 22/;
' |
sort -u |
xargs -l1 -- sh -c '
if echo "~" | nc -q1 -w3 "$1" "$2" | grep -q "^SSH"; then
echo "#### SUCCESS $1 $2";
else
echo "#### ERROR $1 $2";
fi
' --
So:
Start with /etc/passwd
Filter all "bash_users" as you call them
Filter user home directories only cut -d: -f6
For each user home directory sudo xargs -i -- run
Check if the file .ssh/known_hosts inside the user home directory exists
If it does, print it
Filter only hosts names
Multiple hosts signatures may share same key and are separated by a comma. Replace comma for newline
Now a sed script:
If a line starts with a [ that means it has a format of [host]:port and I want to replace it with host port
If the line does not start with a [ I add 22 to the end of the line so it's host 22
Then I sort -u
Now for each line:
I get the ssh version from ssh echo "~" | nc hostname port returns smth like "SSH-2.0-OpenSSH_6.0" + newline + "Protocol mismatch".
So if the line returned by nc hostname port starts with SSH that means there is ssh running on the other side
I added timeout for unresponsive hosts, but I think nc -w timeout option may also be used. Probably also nc -q 1 should be specified.
Now the real fun is, when you add the max-procs option to the last xargs line, you can check all hosts simultaneously. On my host I have 47 unique addresses and xargs -P30 checks them ALL in like 2 seconds.
But really there are some problems. The script needs root to read from all users known_hosts. But worse, the known_hosts may be hashed. It would be better to firstly know the list of hosts on your network, and then generate known_hosts from it. It would look like ssh-keyscan -f list_of_hosts > ~/.ssh/known_hosts or similar. Generaly ssh-keygen -F hostname should be used if a host exists in known_hosts, sadly there is no listing command. known_hosts file format may be found in ssh documentation.

Allow sh to be run from anywhere

I have been monitoring the performance of my Linux server with ioping (had some performance degradation last year). For this purpose I created a simple script:
echo $(date) | tee -a ../sb-output.log | tee -a ../iotest.txt
./ioping -c 10 . 2>&1 | tee -a ../sb-output.log | grep "requests completed in\|ioping" | grep -v "ioping statistics" | sed "s/^/IOPing I\/O\: /" | tee -a ../iotest.txt
./ioping -RD . 2>&1 | tee -a ../sb-output.log | grep "requests completed in\|ioping" | grep -v "ioping statistics" | sed "s/^/IOPing seek rate\: /" | tee -a ../iotest.txt
etc
The script calls ioping in the folder /home/bench/ioping-0.6. Then it saves the output in readable form in /home/bench/iotest.txt. It also adds the date so I can compare points in time.
Unfortunately I am no experienced programmer and this version of the script only works if you first enter the right directory (/home/bench/ioping-0.6).
I would like to call this script from anywhere. For example by calling
sh /home/bench/ioping.sh
Googling this and reading about path variables was a bit over my head. I kept up ending up with different version of
line 3: ./ioping: No such file or directory
Any thoughts on how to upgrade my scripts so that it works anywhere?
The trick is the shell's $0 variable. This is set to the path of the script.
#!/bin/sh
set -x
cd $(dirname $0)
pwd
cd ${0%/*}
pwd
If dirname isn't available for some reason, like some limited busybox distributions, you can try using shell parameter expansion tricks like the second one in my example.
Isn't it obvious? ioping is not in . so you can't use ./ioping.
Easiest solution is to set PATH to include the directory where ioping is. perhaps more robust - figure out the path to $0 and use that path as the location for ioping (assing your script sits next to ioping).
If iopinf itself depend on being ruin in a certain directory, you might have to make your script cd to the ioping directory before running.

ssh tail with nested ls and head cannot access

am trying to execute the following command:
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
ls: cannot access /var/log/alert_ARCDB.log: No such file or directory
tail: cannot follow `-' by name
notice the error returned, when i login to ssh separately and then execute
tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
see the below:
# ls -t /var/log/alert_ARCDB.log | head -n1
/var/log/alert_ARCDB.log
why is that happening and how to fix it. am trying to do this in one line as i don't want to create a script file.
Thanks a lot
Shell parameter expansion happens before command execution.
Here's a simple example. If I type...
ls "$HOME"
...the shell replaces $HOME with the path to my home directory first, then runs something like ls /home/larsks. The ls command has no idea that the command line originally had $HOME.
If we look at your command...
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
...we see that you're in exactly the same situation. The $(ls -t ...) expression is expanded before ssh is executed. In other words, that command is running your local system.
You can inhibit the shell expansion on your local system by using single quotes. For example, running:
echo '$HOME'
Will produce:
$HOME
So you can run:
ssh root#10.10.10.50 'tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )'
But there's another problem here. If /var/log/alert_ARCDB.log is a file, your command makes no sense: calling ls -t on a single file gets you nothing.
If alert-ARCDB.log is a directory, you have a different problem. The result of ls /some/directory is a list of filenames without any directory prefix. If I run something like:
ls -t /tmp
I will get output like
file1
file2
If I do this:
tail $(ls -t /tmp | head -1)
I end up with a command that looks like:
tail file1
And that will fail, because there is no file1 in my current directory.
One approach would be to pipe the commands you want to perform to ssh. One simple way to achieve that is to first create a function that will echo the commands you want executed :
remote_commands()
{
echo 'cd /var/log/alert_ARCDB.log'
echo 'tail -F -n 1 "$(ls -t | head -n1 )"'
}
The cd will allow you to use the relative path listed by ls. The single quotes make sure that everything will be sent as-is to the remote shell, with no local expansion occurring.
Then you can do
ssh root#10.10.10.50 bash < <(remote_commands)
This assumes alert_ARCDB.log is a directory (or else I am not sure why you would want to add head -n1 after that).

Bash grep command finding the same file 5 times

I'm building a little bash script to run another bash script that's found in multiple directories. Here's the code:
cd /home/mainuser/CaseStudies/
grep -R -o --include="Auto.sh" [\w] | wc -l
When I execute just that part, it finds the same file 5 times in each folder. So instead of getting 49 results, I get 245. I've written a recursive bash script before and I used it as a template for this problem:
grep -R -o --include=*.class [\w] | wc -l
This code has always worked perfectly, without any duplication. I've tried running the first code with and without the " ", I've tried -r as well. I've read through the bash documentation and I can't seem to find a way to prevent, or even why I'm getting, this duplication. Any thoughts on how to get around this?
As a separate, but related question, if I could launch Auto.sh inside of each directory so that the output of Auto.sh was dumped into that directory; without having to place Auto.sh in each folder. That would probably be much more efficient that what I'm currently doing and it would also probably fix my current duplication problem.
This is the code for Auto.sh:
#!/bin/bash
index=1
cd /home/mainuser/CaseStudies/
grep -R -o --include=*.class [\w] | wc -l
grep -R -o --include=*.class [\w] |awk '{print $3}' > out.txt
while read LINE; do
echo 'Path '$LINE > 'Outputs/ClassOut'$index'.txt'
javap -c $LINE >> 'Outputs/ClassOut'$index'.txt'
index=$((index+1))
done <out.txt
Preferably I would like to make it dump only the javap outputs for the application its currently looking at. Since those .class files could be in any number of sub-directories, I'm not sure how to make them all dump in the top folder, without executing a modified Auto.sh in the top directory of each application.
Ok, so to fix the multiple find:
grep -R -o --include="Auto.sh" [\w] | wc -l
Should be:
grep -R -l --include=Auto.sh '\w' | wc -l
The reason this was happening, was that it was looking for instances of the letter w in Auto.sh. Which occurred 5 times in the file.
However, the overall fix that doesn't require having to place Auto.sh in every directory, is something like this:
MAIN_DIR=/home/mainuser/CaseStudies/
cd $MAIN_DIR
ls -d */ > DirectoryList.txt
while read LINE; do
cd $LINE
mkdir ProjectOutputs
bash /home/mainuser/Auto.sh
cd $MAIN_DIR
done <DirectoryList.txt
That calls this Auto.sh code:
index=1
grep -R -o --include=*.class '\w' | wc -l
grep -R -o --include=*.class '\w' | awk '{print $3}' > ProjectOutputs.txt
while read LINE; do
echo 'Path '$LINE > 'ProjectOutputs/ClassOut'$index'.txt'
javap -c $LINE >> 'ProjectOutputs/ClassOut'$index'.txt'
index=$((index+1))
done <ProjectOutputs.txt
Thanks again for everyone's help!

Resources