How to execute shell script on top of spark - linux

How to execute shell script on top spark.
script is below.
#!/bin/bash
load data into local path $source $dest

If you are using oozie , then have a two step workFlow
a bash action
the spark job.
In bash action execute the expected command.
Else you can do the same stuff by using hdfs client in your java code before submitting the spark job. sparkContext.submit.
Please follow for detailed understanding http://bytepadding.com/big-data/spark/how-to-submit-spark-job-through-oozie/

Related

echo map reduce output in shell script using oozie

The .sh script has the following content inside it
echo hbase org.apache.hadoop.hbase.mapreduce.RowCounter TABLE_NAME
The above script will be called by oozie which will capture whatever output the above command emits. The problem is, the above command is executed and no output gets echoed since all the output is happening in the background. How to capture those background output and echo it using shell script?
First thing first, you need to have <capture-output/> in the Oozie Shell action.
Also to capture the output in Oozie shell action, the script need to be something like the following:
set var=`hbase org.apache.hadoop.hbase.mapreduce.RowCounter TABLE_NAME`
echo "capture_var=$var"
After you do this, the variable will be available to be passed/used within Oozie using:
${wf:actionData('shellscriptoozieactionname')['capture_var']}

Import bash variables into slurm script

I have seen similar questions, but not exactly the same as mine: Use Bash variable within SLURM sbatch script, because I am not talking about slurm parameters.
I want to launch a slurm job for each of my sample files, so imagine I have 3 vcfs and I want to run a job for each of them:
I created a script to loop through a file in which I wrote sampleIds to run another script with each sample, which would perfectly work if I wanted to run it directly with bash:
while read line
do
sampleID="${line[0]}"
myscript.sh $sampleID
The problem is that I need to run the script with slurm, so is there any way to indicate slurm the bash variable that it should include?
I was trying this, but it is not working:
sbatch myscrip.sh --export=$sampleID
Okay, I've solved it:
sbatch --export=sampleID=$sampleID myscript.sh

Bash Script for Submitting Job on Cluster

I am trying to write a script so I can use the 'qsub' command to submit a job to the cluster.
Pretty much, once I get into the cluster, I go to the directory with my files and I do these steps:
export PATH=$PATH:$HOME/program/bin
Then,
program > run.log&
Is there any way to make this into a script so I am able to submit the job to the queue?
Thanks!
Putting the lines into a bash script and then running qsub myscript.sh should do it.

Stop slurm sbatch from copying script to compute node

Is there a way to stop sbatch from copying the script to the compute node. For example when I run:
sbatch --mem=300 /shared_between_all_nodes/test.sh
test.sh is copied to /var/lib/slurm-llnl/slurmd/etc/ on the executing compute node. The trouble with this is there are other scripts in /shared_between_all_nodes/ that test.sh needs to use and I would like to avoid hard coding the path.
In sge I could use qsub -b y to stop it from copying the script to the compute node. Is there a similar option or config in slurm?
Using sbatch --wrap is a nice solution for this
sbatch --wrap /shared_between_all_nodes/test.sh
quotes are required if the script has parameters
sbatch --wrap "/shared_between_all_nodes/test.sh param1 param2"
from sbatch docs http://slurm.schedmd.com/sbatch.html
--wrap=
Sbatch will wrap the specified command string in a simple "sh" shell script, and submit that script to the slurm controller. When --wrap is used, a script name and arguments may not be specified on the command line; instead the sbatch-generated wrapper script is used.
The script might be copied there, but the working directory will be the directory in which the sbatch command is launched. So if the command is launched from /shared_between_all_nodes/ it should work.
To be able to lauch sbatch form anywhere, use this option
-D, --workdir=<directory>
Set the working directory of the batch script to directory before
it is executed.
like
sbatch --mem=300 -D /shared_between_all_nodes /shared_between_all_nodes/test.sh

Launch shell scripts from Jenkins

I'm a complete newbie to Jenkins.
I'm trying to get Jenkins to monitor the execution of my shell script so i that i don't have to launch them manually each time but i can't figure out how to do it.
I found out about the "monitor external job" option but i can't configure it correctly.
I know that Jenkins can understand Shell script exit code so this is what i did :
test1(){
ls /home/user1 | grep $2
case $? in
0) msg_error 0 "Okay."
;;
*) msg_error 2 "Error."
;;
esac
}
It's a simplified version of my functions.
I execute them manually but i want to launch them from Jenkins with arguments and get the results of course.
Can this be done ?
Thanks.
You might want to consider setting up an Ant build that executes your shell scripts by using Ant's Exec command:
http://ant.apache.org/manual/Tasks/exec.html
By setting the Exec task's failonerror parameter to true, you can have the build fail if your shell script returns an error code.
To use parameters in your shell you can always send them directly. for example:
Define string parameter in your job Param1=test_param
in your shell you can use $Param1 and it will send the value "test_param"
Regarding the output, everything you do under the shell section will be only relevant to the session of the shell. you can try to return your output into a key=value txt file in the workspace and inject the results using EnvInject Plugin. Then you can access the value as if you defined it as a parameter for the job. In the example above, after injecting the file, executing shell echo $Param1 will print "test_param"
Hope it's helpful!

Resources