I am trying to Deploy to a list of servers in parallel to save some time. The names of servers are listed in a collection: serverNames
The original code was:
serverNames.each({
def server = new Server([steps: steps, hostname: it, domain: "test"])
server.stopTomcat()
server.ssh("rm -rf ${WEB_APPS_DIR}/pc*")
PLMScriptUtils.secureCopy(steps, warFileLocation, it, WEB_APPS_DIR)
})
Basically i want to stop the tomcat, rename a file and then copy a war file to a location using the following lines:
server.stopTomcat()
server.ssh("rm -rf ${WEB_APPS_DIR}/pc*")
PLMScriptUtils.secureCopy(steps, warFileLocation, it, WEB_APPS_DIR)
The original code was working properly and it took 1 server from the collection serverNames and performed the 3 line to do the deploy.
But now i have requirement to run the deployment to the servers listed in serverNames parallely
Below is my new modified code:
def threads = []
def th
serverNames.each({
def server = new Server([steps: steps, hostname: it, domain: "test"])
th = new Thread({
steps.echo "doing deployment"
server.stopTomcat()
server.ssh("rm -rf ${WEB_APPS_DIR}/pc*")
PLMScriptUtils.secureCopy(steps, warFileLocation, it, WEB_APPS_DIR)
})
threads << th
})
threads.each {
steps.echo "joining thread"
it.join()
}
threads.each {
steps.echo "starting thread"
it.start()
}
The echo statements were added to visualize the flow.
With this the output is coming as:
joining thread
joining thread
joining thread
joining thread
starting thread
starting thread
starting thread
starting thread
The number of servers in the collection was 4 hence 4 times the thread is being added and started. but it is not executing the 3 lines i want to run in parallel, which means "doing deployment" is not being printed at all and later the build is failing with an exception.
Note that i am running this Groovy code as a pipeline through Jenkins this whole piece of code is actually a function called deploy of the class deployment and my pipeline in jenkins is creating an object of the class deployment and then calling the deploy function
Can anyone help me with this ? I am stuck like hell with this one. :-(
Have a look at the parallel step. In scripted pipelines (which you seem to be using), you can pass it a map of thread name to action (as a Groovy closure) which is then run in parallel.
deployActions = [
Server1: {
// stop tomcat etc.
},
Server2: {
...
}
]
parallel deployActions
It is much simpler and the recommended way of doing it.
Related
is there a way to set a timeout for a step in Amazon Aws EMR?
I'm running a batch Apache Spark job on EMR and I would like the job to stop with a timeout if it doesn't end within 3 hours.
I cannot find a way to set a timeout not in Spark, nor in Yarn, nor in EMR configuration.
Thanks for your help!
I would like to offer an alternative approach, without any timeout/shutdown logic making application itself more complex than needed - although I am obviously quite late to the party. Maybe it proves useful for someone in the future.
You can:
write a Python script and use it as a wrapper around regular Yarn commands
execute those Yarn commands via subprocess lib
parse their output according to your will
decide which Yarn applications should be killed
More details about what I am talking about follow...
Python wrapper script and running the Yarn commands via subprocess lib
import subprocess
running_apps = subprocess.check_output(['yarn', 'application', '--list', '--appStates', 'RUNNING'], universal_newlines=True)
This snippet would give you an output similar to something like this:
Total number of applications (application-types: [] and states: [RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1554703852869_0066 HIVE-645b9a64-cb51-471b-9a98-85649ee4b86f TEZ hadoop default RUNNING UNDEFINED 0% http://ip-xx-xxx-xxx-xx.eu-west-1.compute.internal:45941/ui/
You can than parse this output (beware there might be more than one app running) and extract application-id values.
Then, for each of those application ids, you can invoke another yarn command to get more details about the specific application:
app_status_string = subprocess.check_output(['yarn', 'application', '--status', app_id], universal_newlines=True)
Output of this command should be something like this:
Application Report :
Application-Id : application_1554703852869_0070
Application-Name : com.organization.YourApp
Application-Type : HIVE
User : hadoop
Queue : default
Application Priority : 0
Start-Time : 1554718311926
Finish-Time : 0
Progress : 10%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://ip-xx-xxx-xxx-xx.eu-west-1.compute.internal:40817
RPC Port : 36203
AM Host : ip-xx-xxx-xxx-xx.eu-west-1.compute.internal
Aggregate Resource Allocation : 51134436 MB-seconds, 9284 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
Unmanaged Application : false
Application Node Label Expression : <Not set>
AM container Node Label Expression : CORE
Having this you can also extract application's start time, compare it with current time and see for how long it is running.
If it is running for more than some threshold number of minutes, for example you kill it.
How do you kill it?
Easy.
kill_output = subprocess.check_output(['yarn', 'application', '--kill', app_id], universal_newlines=True)
This should be it, from the killing of the step/application perspective.
Automating the approach
AWS EMR has a wonderful feature called "bootstrap actions".
It runs a set of actions on EMR cluster creation and can be utilized for automating this approach.
Add a bash script to bootstrap actions which is going to:
download the python script you just wrote to the cluster (master node)
add the python script to a crontab
That should be it.
P.S.
I assumed Python3 is at our disposal for this purpose.
Well, as many have already answered, an EMR step cannot be killed/stopped/terminated via an API call at this moment.
But to achieve your goals, you can introduce a timeout as part of your application code itself. When you submit EMR steps, a child process is created to run your application - be it MapReduce Application, Spark Application, etc. and the step completion is determined by the exit code this child process (which is your application) returns.
For example, if you are submitting a MapReduce Application, you can use something like below :
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
final Runnable stuffToDo = new Thread() {
#Override
public void run() {
job.submit();
}
};
final ExecutorService executor = Executors.newSingleThreadExecutor();
final Future future = executor.submit(stuffToDo);
executor.shutdown(); // This does not cancel the already-scheduled task.
try {
future.get(180, TimeUnit.MINUTES);
}
catch (InterruptedException ie) {
/* Handle the interruption. Or ignore it. */
}
catch (ExecutionException ee) {
/* Handle the error. Or ignore it. */
}
catch (TimeoutException te) {
/* Handle the timeout. Or ignore it. */
}
System.exit(job.waitForCompletion(true) ? 0 : 1);
Reference - Java: set timeout on a certain block of code?.
Hope this helps.
I'm attempting to use the Jenkins Job DSL plugin for the first time to create some basic job "templates" before getting into more complex stuff.
Jenkins is running on a Windows 2012 server. The Jenkins version is 1.650 and we are using the Job DSL plugin version 1.51.
Ideally what I would like is for the seed job to be parameterised so that when it is being run the user can enter four things: the Job DSL script location, the name of the generated job, a Slack channel for failure notifications, and an email address for failure notifications.
The first two are fine: I can call the parameters in the groovy script, for example the script understands job("${JOB_NAME}") and takes the name I enter for the job when I run the seed job.
However when I try to do the same thing with a Slack channel the groovy script doesn't seem to want to play. Note that if I specify a Slack channel rather than trying to call a parameter it works fine.
My Job DSL script is here:
job("${JOB_NAME}") {
triggers {
cron("#daily")
}
steps {
shell("echo 'Hello World'")
}
publishers {
slackNotifier {
room("${SLACK_CHANNEL}")
notifyAborted(true)
notifyFailure(true)
notifyNotBuilt(false)
notifyUnstable(true)
notifyBackToNormal(true)
notifySuccess(false)
notifyRepeatedFailure(false)
startNotification(false)
includeTestSummary(false)
includeCustomMessage(false)
customMessage(null)
buildServerUrl(null)
sendAs(null)
commitInfoChoice('NONE')
teamDomain(null)
authToken(null)
}
}
logRotator {
numToKeep(3)
artifactNumToKeep(3)
publishers {
extendedEmail {
recipientList('me#mydomain.com')
defaultSubject('Seed job failed')
defaultContent('Something broken')
contentType('text/html')
triggers {
failure ()
fixed ()
unstable ()
stillUnstable {
subject('Subject')
content('Body')
sendTo {
developers()
requester()
culprits()
}
}
}
}
}
}
}
But starting the seed job fails and gives me this output:
Started by user
Building on master in workspace D:\data\jenkins\workspace\tutorial-job-dsl-2
Disk space threshold is set to :5Gb
Checking disk space Now
Total Disk Space Available is: 28Gb
Node Name: master
Running Prebuild steps
Processing DSL script jobBuilder.groovy
ERROR: (jobBuilder.groovy, line 10) No signature of method: javaposse.jobdsl.plugin.structs.DescribableContext.room() is applicable for argument types: (org.codehaus.groovy.runtime.GStringImpl) values: [#dev]
Possible solutions: wait(), find(), dump(), grep(), any(), wait(long)
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
Started calculate disk usage of workspace
Finished Calculation of disk usage of workspace in 0 seconds
Finished: FAILURE
This is the first time I have tried to do anything with Groovy and I'm sure it's a basic error but would appreciate any help.
Hm, that's a bug in Job DSL, see JENKINS-39153.
You actually do not need to use the template string syntax "${FOO}" if you just want to use the value of FOO. All parameters are string variables which can be used directly:
job(JOB_NAME) {
// ...
publishers {
slackNotifier {
room(SLACK_CHANNEL)
notifyAborted(true)
notifyFailure(true)
notifyNotBuilt(false)
notifyUnstable(true)
notifyBackToNormal(true)
notifySuccess(false)
notifyRepeatedFailure(false)
startNotification(false)
includeTestSummary(false)
includeCustomMessage(false)
customMessage(null)
buildServerUrl(null)
sendAs(null)
commitInfoChoice('NONE')
teamDomain(null)
authToken(null)
}
}
// ...
}
This syntax is more concise and does not trigger the bug.
I'm trying to write down a script that will get the Build Number of a Build that has been triggered by another job. For example:
I have a build job that calls two other jobs(Call/trigger builds on other project). When the main job is finished with success I would like to get the number of the first build job that was triggered from within. The script I'm trying to run founds the main job, however I can't get in any way the build number of the triggered job.
def job = jenkins.model.Jenkins.instance.getItem("Hourly")
job.builds.each {
def build = it
if (it.getResult().toString().equals("SUCCESS")) {The rest of the code should go here!}
I was trying to find it in the Jenkins java-doc API and online, however without any luck. Can somebody please help me with that?
P.S. The script runs after the job has finished(triggered when needed only).
You can parse the build number (of the child job) from the build log (of the parent job).
For example:
j = Jenkins.getInstance();
jobName = "parentJobName";
job = j.getItem(jobName);
bld = job.getBuildByNumber(parentBuildNumber);
def buildLog = bld.getLog(10); //make sure you read enough lines
def group = (buildLog =~ /#(\d+) of Job : childJobName with/ );
println("The triggered build number: ${group[0][1]}");
I have a MultiJob Project (made with the Jenkins Multijob plugin), with a series of MultiJob Phases. Let's say one of these jobs is called SubJob01. The jobs that are built are each configured with the "Restrict where this project can be run" option to be tied to one node. SubJob01 is tied to Slave01.
I would like it if these jobs would fail fast when the node is offline, instead of saying "(pending—slave01 is offline)". Specifically, I want there to be a record of the build attempt in SubJob01, with the build being marked as failed. This way, I can configure my MultiJob project to handle the situation as I'd like, instead of using the Jenkins build timeout plugin to abort the whole thing.
Does anyone know of a way to fail-fast a build if all nodes are offline? I could intersperse the MultiJob project with system Groovy scripts to check whether the desired nodes are offline, but that seems like it'd be reinventing, in the wrong place, what should already be a feature.
I ended up creating this solution which has worked well. The first build step of SubJob01 is an Execute system Groovy script, and this is the script:
import java.util.regex.Matcher
import java.util.regex.Pattern
int exitcode = 0
println("Looking for Offline Slaves:");
for (slave in hudson.model.Hudson.instance.slaves) {
if (slave.getComputer().isOffline().toString() == "true"){
println(' * Slave ' + slave.name + " is offline!");
if (slave.name == "Slave01") {
println(' !!!! This is Slave01 !!!!');
exitcode++;
} // if slave.name
} // if slave offline
} // for slave in slaves
println("\n\n");
println "Slave01 is offline: " + hudson.model.Hudson.instance.getNode("Slave01").getComputer().isOffline().toString();
println("\n\n");
if (exitcode > 0){
println("The Slave01 slave is offline - we can not possibly continue....");
println("Please contact IT to resolve the slave down issue before retrying the build.");
return 1;
} // if
println("\n\n");
The jenkins pipeline statement 'beforeAgent true' can be used in evaluating the when condition previous to entering the agent.
stage('Windows') {
when {
beforeAgent true
expression { return ("${TARGET_NODES}".contains("windows")) }
}
agent { label 'win10' }
steps {
cleanWs()
...
}
Ref:
https://www.jenkins.io/doc/book/pipeline/syntax/
https://www.jenkins.io/blog/2018/04/09/whats-in-declarative/
I have an application that when the main method is executed, it starts a web server to host some RESTful services (using Dropwizard). I'm trying to write tests that access the HTTP methods (rather than the Java methods), so the tests have a prerequisite that the server is running.
Here is my task that executes the application and starts the web server:
task run (dependsOn: 'classes', type: JavaExec) {
main = 'com.some.package.to.SomeService'
classpath = sourceSets.main.runtimeClasspath
args 'server', 'some.yml'
}
The server takes a few seconds to start up, too. Roughly, what I want to do is something like this:
test.doFirst {
println "Starting application..."
Thread.startDaemon {
// What goes here???
}
sleep 20000
println "Application should be started."
}
In other words, before running tests, start the application in a separate thread and wait some time before running tests, giving it time to finish starting up.
That said, I can't figure out what goes in Thread.startDaemon (tasks.run.execute() doesn't work), nor if this is even the best approach. What would be the best way of going about this?
Thanks!
What I would probably do is something like this:
task startServer (type: Exec) {
workingDir 'tomcat/bin'
// using START hopefully forks the process
commandLine 'START', 'start.bat'
standardOutput = new ByteArrayOutputStream()
ext.output = {
return standardOutput.toString()
}
// loop through output stream for finished flag
// or just put a timeout here
}
task testIt (type: Test) {
description "To test it."
include 'org/foo/Test*.*'
}
Then, when calling Gradle targets, call "gradle.bat startServer testIt" . That is the basic idea.