Linux script for probing ssh connection in a loop and start log command after connect

Linux script for probing ssh connection in a loop and start log command after connect - linux

I have a host machine that gets rebooted or reconnected quite a few times.
I want to have a script running on my dev machine that continuously tries to log into that machine and if successful runs a specific command (tailing the log data).
Edit: To clarify, the connection needs to stay open. The log command keeps tailing until I stop it manually.
What I have so far
#!/bin/bash
IP=192.168.178.1
if (("$#" >= 1))
then
IP=$1
fi
LOOP=1
trap 'echo "stopping"; LOOP=0' INT
while (( $LOOP==1 ))
do
if ping -c1 $IP
then
echo "Host $IP reached"
sshpass -p 'password' ssh -o ConnectTimeout=10 -q user#$IP '<command would go here>'
else
echo "Host $IP unreachable"
fi
sleep 1
done
The LOOP flag is not really used. The script is ended via CTRL-C.
Now this works if I do NOT add a command to be executed after the ssh and instead start the log output manually. On a disconnect the script keeps probing the connection and logs back in once the host is available again.
Also when I disconnect from the host (CTRL-D) the script will log right back into the host if CTRL-C is not pressed fast enough.
When I add a command to be executed after ssh the loop is broken. So pressing (CTRL-C) does not only stop the log but also disconnects and ends the script on the dev machine.
I guess I have to spawn another shell somewhere or something like that?
1) I want the script to keep probing, log in and run a command completely automatically and fall back to probing when the connection breaks.
2) I want to be able to stop the log on the host (CTRL-C) and thereby fall back to a logged in ssh connection to use it manually.
How do I fix this?

Maybe best approach on "fixing" would be fixing requirements.
The problematic part is number "2)".
The problem is from how SIGINT works.
When triggered, it is sent to the current control group related to your terminal. Mostly this is the shell and any process started from there. With more modern shells (you seem to use bash), the shell manages control groups such that programs started in the background are disconnected (by having been assigned a different control group).
In your case the ssh is started in the foreground (from a script executed in the foreground), so it will receive the interrupt, forward it to the remote and terminate as soon as the remote end terminated. As by that time the script shell has processed its signal handler (specified by trap) it is going to exit the loop and terminate itself.
So, as you can see, you have overloaded CTRL-C to mean two things:
terminate the monitoring script
terminate the remote command and continue with whatever is specified for the remote side.
You might get closer to what you want if you drop the first effect (or at least make it more explicit). Then, calling a script on the remote side that does not terminate itself but just the tail command, will be step. In that case you will likely need to use -t switch on ssh to get a terminal allocated for allowing normal shell operation later.
This, will not allow for terminating the remote side with just CTRL-C. You always will need to exit the remote shell that is going to be run.
The essence of such a remote script might look like:
tail command
shell
of course you would need to add whatever parts will be necessary for your shell or coding style.
An alternate approach would be to keep the current remote command being terminated and add another ssh call for the case of being interrupted that is spanning the shell for interactive use. But in that case, also `CTRL-C will not be available for terminating the minoring altogether.
To achieve this you might try changing active interrupt handler with your monitoring script to trigger termination as soon as the remote side returns. However, this will cause a race condition between the user being able to recognize remote command terminated (and control has been returned to local script) and the proper interrupt handler being in place. You might be able to sufficiently lower that risk be first activating the new trap handler and then echoing the fact and maybe add a sleep to allow the user to react.

Not really sure what you are saying.
Also, you should disable PasswordAuthentication in /etc/ssh/sshd_config and log by adding the public key of your home computer to `~/.ssh/authorized_keys
! /bin/sh
while [ true ];
do
RESPONSE=`ssh -i /home/user/.ssh/id_host user#$IP 'tail /home/user/log.txt'`
echo $RESPONSE
sleep 10
done

Related

Alias <cmd> to "do X then <cmd>" transparently

The title sucks but I'm not sure of the correct term for what I'm trying to do, if I knew that I'd probably have found the answer by now!
The problem:
Due to an over-zealous port scanner (customer's network monitor) and an overly simplistic telnet daemon (busybox linux) every time port 23 gets scanned, telnetd launches another instance of /bin/login waiting for user input via telnet.
As the port scanner doesn't actually try to login, there is no session, so there can be no session timeout, so we quickly end up with a squillion zombie copies of /bin/login running.
What I'm trying to do about it:
telnetd gives us the option (-l) of launching some other thing rather than /bin/login so I thought we could replace /bin/login with a bash script that kills old login processes then runs /bin/login as normal:
#!/bin/sh
# First kill off any existing dangling logins
# /bin/login disappears on successful login so
# there should only ever be one
killall -q login
# now run login
/bin/login
But this seems to return immediately (no error, but no login prompt). I also tried just chaining the commands in telnetd's arguments:
telnetd -- -l "killall -q login;/bin/login"
But this doesn't seem to work either (again - no error, but no login prompt). I'm sure there's some obvious wrinkle I'm missing here.
System is embedded Linux 2.6.x running Busybox so keeping it simple is the greatly preferred option.
EDIT: OK I'm a prat for not making the script executable, with that done I get the login: prompt but after entering the username I get nothing further.

Check that your script has the execute bit set. Permissions should be the same as for the original binary including ownership.
As for -l: My guess is that it tries to execute the command killall -q login;/bin/login (that's one word).
Since this is an embedded system, it might not write logs. But you should check /var/log anyway for error messages. If there are none, you should be able to configure it using the documentation: http://wiki.openwrt.org/doc/howto/log.overview

Right, I fixed it, as I suspected there was a wrinkle I was missing:
exec /bin/login
I needed exec to hand control over to /bin/login rather than just call it.
So the telnet daemon is started thusly:
/usr/sbin/telnetd -l /usr/sbin/not_really_login
The contents of the not-really-login script are:
#!/bin/sh
echo -n "Killing old logins..."
killall -q login
echo "...done"
exec /bin/login
And all works as it should, on telnet connect we get this:
**MOTD Etc...**
Killing old logins......done
login: zero_cool
password:
And we can login as usual.
The only thing I haven't figured out is if we can detect the exit-status of /bin/login (if we killed it) and print a message saying Too slow, sucker! or similar. TBH though, that's a nicety that can wait for a rainy day, I'm just happy our stuff can't be DDOS'ed over Telnet anymore!

About the internals of nohup ssh

I mindlessly use nohup ssh for issuing a remote ssh command without worrying about the accident hangup. Now I'm starting to think about it and it is not pretty clear.
What I'm wondering is that why just doing "ssh remote sleep 100 &" stops the job after few seconds? For instance,
$ ssh remote sleep 100 &
[1] 13358
$
[1]+ Stopped ssh remote sleep 100
By what reason is this job stopped? Could you explain the internals of this job control?

If you want the remote command to keep working until it's finished (and not depend on the ssh connection with the remote host): you could use screen (or tmux) .
connect using ssh to the remote host
once connected: screen to start a screen session (a kind of "virtual terminal", that will keep running until you close it, instead of depending on your own connection to it)
you can then detach from screen (ctrl-a d) and re-attach to it later (from another machine, etc) : just ssh again, "screen -l" to list screens, and "screen -r" to re-attach to one. Read about screen on the net.
The reason your job is stopped is not linked to the command, but to the internal of job handling. Some (good) infos can be found on http://www.linusakesson.net/programming/tty/ (search for background if you don't read the whole thing. But read the whole thing ^^). In a nutshell ... writing to a TTY from a background job will cause a SIGTTOU to suspend the entire process group (maybe your ssh asked for a password? or it displays something when connecting?)
The advantage of screen over running on the remote host usign "nohup" are numerous. The main one is that if you try to re-connect to a nohup program (ex: vi) it can't (easily) be done... especially if it is multi-line. But when you re-attach to a screen session, you see the (virtual) terminal as if you never left it (ie, it's updated if the command added things on the screen, and it still have rows/columns, etc).
You can also work at several person on the same terminal (or have some person "view it" while one works in it).
Etc.

The command
ssh remote sleep 100 &
only runs ssh in the background. Once ssh is started on the local machine, control returns to the local shell, regardless of what is running (via sshd) on the remote end.

Execute script on remote host - output given in local host

I am trying to execute two scripts which are available as sh files on remote host having 755 permissions.
I try callling them from client host as below:
REMOTE_HOST="host1"
BOUNCE_SCRIPT="
/code/sys/${ENV}/comp/1/${ENV}/scripts/unix/stopScript.sh ${ENV};
/code/sys/${ENV}/comp/1/${ENV}/scripts/unix/startScript.sh ${ENV};
"
ssh ${REMOTE_HOST} "${BOUNCE_SCRIPT}"
Above lines are in a script on local host.
While running the script on local host, the first command on remote host i.e. stopScript.sh gets executed correctly. It kills the running process which it was inteded to kill w/o any error.
However output of second script i.e. startScript.sh gets printed to local host window but the process it intended to start does not start on remote host.
Can anyone please let me know?
Is the way executing script on remote host correct?
Should I see output of running script on remote host locally as well? i.e. on the window of local host?
Thanks

You could try the -n flag for ssh:
ssh -n $REMOTE_HOST "$BOUNCE_SCRIPT" >> $LOG
The man page has further information (http://unixhelp.ed.ac.uk/CGI/man-cgi?ssh+1). The following is a snippet:
-n Redirects stdin from /dev/null (actually, prevents reading from
stdin).

Prefacing your startScript.sh line with 'nohup' may help. Often times if you remotely execute commands they will die when your ssh session ends, nohup allows your process to live after the session has ended. It would be helpful to know if your process is starting at all or if it starts and then dies.

I think cyber-monk is right, you should launch the processes with nohup to create à new independent process. Look if your stop script is killing the right process (the new one included).

SSH Persistent Connection Timeout

I setup an ssh tunnel using a bash script, and the ssh tunnel is configured as a shared persistent connection tunnel.
At the end of my script, though, I have it setup to invoke a close command against the tunnel and to delete the .ssh/config file so that it doesn't remain open and nor does subsequent ssh tunnels that are manually started by a user.
Question is this... what is the best way to handle this issue of making sure the tunnel is closed in case someone ctrl+c the script or it crashes for some reason in the middle of the script before it invokes the close command and deletes the config file? I was going to add a timeout to the control master, but I cannot determine what I need to use based on my readings in the ssh_config man page.

Try to use trap:
#!/bin/bash
on_sigint(){
echo this function is called on ctrl+c
}
trap "on_sigint" SIGINT SIGTERM
echo start
# Do what you want
...
echo stop

How to know from a bash script if the user abruptly closes ssh session

I have a bash script that acts as the default shell for a user loging in trough ssh.
It provides a menu with several options one of wich is sending a file using netcat.
The netcat of the embedded linux I'm using lacks the -w option, so if the user closes the ssh connection without ever sending the file, the netcat command waits forever.
I need to know if the user abruptly closes the connection so the script can kill the netcat command and exit gracefully.
Things I've tried so far:
Trapping the SIGHUP: it is not issued. The only signal issued i could find is SIGCONT, but I don't think it's reliable and portable.
Playing with the -t option of the read command to detect a closed stdin: this would work if not for a silly bug in the embedded read command (only times out on the first invocation)
Edit:
I'll try to answer the questions in the comments and explain the situation further.
The code I have is:
nc -l -p 7576 > /dev/null 2>> $LOGFILE < $TMP_DIR/$BACKUP_FILE &
wait
I'm ignoring SIGINT and SIGTSTP, but I've tried to trap all the signals and the only one received is SIGCONT.
Reading the bash man page I've found out that the SIGHUP should be sent to both script and netcat and that the SIGCONT is sent to stopped jobs to ensure they receive the SIGHUP.
I guess the wait makes the script count as stopped and so it receives the SIGCONT but at the same time the wait somehow eats up the SIGHUP.
So I've tried changing the wait for a sleep and then both SIGHUP and SIGCONT are received.
The question is: why is the wait blocking the SIGHUP?
Edit 2: Solved
I solved it polling for a closed stdin with the read builtin using the -t option. To work around the bug in the read builtin I spawn it in a new bash (bash -c "read -t 3 dummy").

Does the Parent PiD change? If so you could look up the parent in the process list and make sure the process name is correct.

I have written similar applications. It would be helpful to have more of the code in your shell. I think there may be a way of writing your overall program differently which would address this issue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string