How to execute spark-shell from file with nohup? - apache-spark

I have a scala script file that gets successfully executed via interactive spark-shell in a classic way: type spark-shell, paste script, wait till completion.
I want to be capable to leave this thing working and exit ssh session, get back to results when I need.
I tried this and it behaves strangely
spark-shell -i file.scala >> out.log 2>&1 &
It prints only several lines of usual spark output to out.log and then reports that the process has ended. When I do 'ps aux | grep spark' I see there is spark running among processes.
When I run this it behaves as expected, but I have to leave session open to have my results.
spark-shell -i file.scala
Is there a way to get spark-shell working with nohup properly?
I know there is spark-submit working with jars but it feels less intuitive, for a simple test I have to asseble a jar and do maven magic.

I encountered the same behavior of spark-shell with nohup. The reasons behind this are unclear, but one can use tmux instead of nohup as a work-around. A pretty good guide on how to use tmux can be found here.
Possible set of actions is as following:
$ tmux new -s session-name
$ ./bin/spark-shell
# do usual stuff manually
Then if you close the terminal window and exit ssh session, you can re-enter the tmux session like this:
$ tmux attach -t session-name

I use a shell script to execute spark-shell, inside my-script.sh:
$SPARK_HOME/bin/spark-shell < $HOME/test.scala > $HOME/test.log 2>&1 &
Read it somewhere by googling and tried it. It is working on my end.

If you are trying to execute it on aws-cli then you use the below command ..
nohup bash -c "YOUR_COMMAND 2>&1 &"
So to execute the spark-shell
nohup bash -c "spark-shell -i file.scala >> out.log 2>&1 &"

Old question, but did you actually try to use the nohup command?
Simply using & to background a process does not prevent it from exiting if it receives a SIGHUP signal, which is what the process will receive when you log out.
Try this:
nohup spark-shell -i file.scala >> out.log &

I'm a bit late to the party, but I recently discovered another solution:
echo ":load myscript.scala" | nohup $SPARK_HOME/bin/spark-shell [other args]
where other args represent more arguments passed to the spark-shell (not to your script; I haven't tested that part). I have a df.write() call at the end of the script so the results are saved to HDFS - no need to have them printed on screen. Note I don't need an & at the end of the command.
I have tried closing the SSH connection and the spark-shell job keeps running tasks, according to the Spark UI :-)

Related

Start lots of background jobs but keep their logs separated

I have little experiences in shell commands in unix.
So far, I have checked stackOverflow and know how to run simple shell scripts in order by
using echo
echo $(sh dosomthing1.sh)
echo $(sh dosomthing2.sh)
directly using sh xxx and wait
sh dosomthing1.sh
wait
sh dosomthing2.sh
using &&
sh dosomthing1.sh && sh dosomthing2.sh
But these ways seem to be helpless to solve my problem...
Here is my problem:
I have a basic shell script to do a maven compile and then using "nohup xxx &" to start a java application in background. the script is shown below:
#get the input env parameter
env=$1
#goto application root directory
cd /applicationDir
#to compile
mvn install -Dmaven.test.skip=true
#to start with parameter env
nohup java -jar -Dspring.profiles.active=$env myApplication.jar &
#to tail the log
tail -20f myApplication.log
I have too many different applications with the same startup scripts and it is hard to start them one by one. I need to start them with one command.
All the shell scripts are expected to be processed one by one in order. If there are any exceptions, skip and run the next one.
And when I tried to write a script like this:
sh start1.sh
wait
echo "application 1 was start up"
sh start2.sh
wait
echo "application 2 was start up"
...
sh startxxx.sh
wait
echo "application xxx was start up"
Though all the children shell scripts will process in order as what I expected, and the output infomations looked like the shell is functioning well, but the fact is only the last application will be started, all the previous command "nohup xxxx &" will be shut down.
Also I have tried to write like this:
sh start1.sh &
sh start2.sh &
...
sh startxxx.sh &
Although the result was what I want, all the application will be started well, but during processing the scripts, because of the parallel running of the scripts, the consoled output is unreadable. It comes to a good result but not a graceful way.
I have no idea how to solve this problem...
Please help me with this, thank you very much!
When you have a script with commands, you cam do chmod +x start.sh. Now the script can be started with ./start.sh. You will avoid an additional sh process and with ls -l you can see which scripts are executable.
In your scripts you have tail -f. This will be very confusing for a backgound process. Start the scripts in the background and view the logging from the console. I do hope that each script is using a different myApplication.jar and myApplication.log.
When the logging in the logfile is duplicated in stdout (your commandline window), you can throw that logging away.
./start1.sh > /dev/null 2>&1 &
./start2.sh > /dev/null 2>&1 &
./startxxx.sh > /dev/null 2>&1 &
The processes will be killed when you logout before the scripts are terminated. This can be avoided with nohup:
nohup ./start1.sh > /dev/null 2>&1 &
nohup ./start2.sh > /dev/null 2>&1 &
nohup ./startxxx.sh > /dev/null 2>&1 &
Edit:
OPS wants to start programs in a fixed order.
Starting scripts exactly one after another in order, should be possible by calling them in the right order (perhaps with an additional sleep 1).
When you need to wait for program 1 finished some init stuff, you need to check that. Use 1 script calling all scripts and add some control statements, like
nohup java something &
while ! grep -q "Started" myApplication.log; do
sleep 1
done
When the java program has an error the while will wait for ever, so replace this with some max retrycount
for ((retry=0l retry<100; retry++)); do
grep -q "Started" myApplication.log && break
sleep 1
done
https://man7.org/linux/man-pages/man8/cron.8.html
This might help you. Cron is a task scheduler, which you can use to run programs in sequence. If the man page is difficult to understand, look for tutorials on it; I'm sure some would exist.

Send command to a nohup process?

I have a minecraft server that when I run it, takes the console and can receive commands/parameters.
I'm running it with nohup java -Xms.... -jar spigot.jar &. It will stay in the background with PID XXXX and port YYYY.
I want know if it is possible to send commands to it as /help.
Regards and thanks for the help.
Instead of running your server in the background with nohup java -Xms.... -jar spigot.jar & you could use a terminal multiplexer like screen.
See https://ss64.com/bash/screen.html
or https://www.gnu.org/software/screen/manual/screen.html
For interactive start, run screen first, then inside the screen session run java -Xms.... -jar spigot.jar (in the foreground, without nohup or &).
Then you can use screen's escape sequence CTRL+a d to detach from the session. Your server will continue to run.
If you later want to interact with the server, use screen -r. This will reattach your terminal to the session.
Type /help or whatever you need to do.
When you are done, detach from the session again.
You could also use screen -d -m java -Xms.... -jar spigot.jar to create a detached session with your command, e.g. in a startup script.
screen has a lot more capabilities. Read the documentation.
I can solved this by tripleee.
This works "nohup tail -f /usr/server/console.in | nohup java -Xms.... -jar spigot.jar >> /usr/server/console.out &"
Whit"echo command >> /usr/server/console.in"
Is as i run the command in the server.

Running process nohup

There is a benchmarking process, which should be run on a system. It takes maybe a day, so I would like to run it nohup. I use this command:
nohup bash ./run_terasort.sh > terasort.out 2>&1 &
After that I can see with PID in jobs -l output, but after closing PuTTy it stops(as I can see, when I login again).
This is a KVM virtualized machine.
You are using nohup right from what I know. But you have an issue detecting the process.
jobs -l only give the processes of current session. Rather try the below to display the process started in your initial session:
ps -eafww|grep run_terasort.sh|grep -v grep

How can I look into nohup file while the program is still running?

I was using
nohup ./program_name &
to run my program, program_name prints out some values and status of the running process including how much percentage the program has finished, but since I'm running it using nohup so I can't see how close my program to finish is, is there anyway I can still get that information?
We have to Just open nohup.out to see output. Probably you want
tail -f nohup.out
for streaming output
Perhaps adjust your nohup command line to capture all output to a file:
nohup ./program_name > /tmp/programName.log 2>&1 &
Then, you can monitor programName.log using tail:
tail -f /tmp/programName.log
Put the below command in current terminal where the program is running
jobs command used to lists the jobs that you are running in the background and in the foreground
jobs -l
[6]+ 6069 Running nohup perl test1.pl &
[6]+ 6069 Done nohup perl test1.pl

Running shell script command after executing an application

I have written a shell script to execute a series of commands. One of the commands in the shell script is to launch an application. However, I do not know how to continue running the shell script after I have launched the application.
For example:
...
cp somedir/somefile .
./application
rm -rf somefile
Once I launched the application with "./application" I am no longer able to continue running the "rm -rf somefile" command, but I really need to remove the file from the directory.
Anyone have any ideas how to compete running the "rm -rf" command after launching the application?
Thanks
As pointed out by others, you can background the application (man bash 'job control', e.g.).
Also, you can use the wait builtin to explicitely await the background jobs later:
./application &
echo doing some more work
wait # wait for background jobs to complete
echo application has finished
You should really read the man pages and bash help for more details, as always:
http://unixhelp.ed.ac.uk/CGI/man-cgi?sh
http://www.gnu.org/s/bash/manual/bash.html#Job-Control-Builtins
Start the application in the background, this way the shell is not going to wait for it to terminate and will execute the consequent commands right after starting the application:
./application &
In the meantime, you can check the background jobs by using the jobs command and wait on them via wait and their ID. For example:
$ sleep 100 &
[1] 2098
$ jobs
[1]+ Running sleep 100 &
$ wait %1
put the started process to background:
./application &
You need to start the command in the background using '&' and maybe even nohup.
nohup ./application > log.out 2>&1

Resources