Jenkins + JenkinsFile + Bourneshell - Step fail - azure

I updated my plugins and Jenkinsfile recently
Using a common stack of
Latest version of Jenkins on Ubuntu
Blue Ocean plugin
Azure VM Agent plugin
Pipeline (Jenkinsfile)
I randomly have steps crashing when they are processed on another agent (created by Azure VM plugin).
wrapper script does not seem to be touching the log file in /tmp/jenkins/workspace/[name of workspace]#tmp/durable-2ae31cde
(JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
script returned exit code -1
How can I fix it?

Related

Windows Job Runner from Linux Cluster (Enterprise)

The documentation from what I have found is a bit sparse on the setup of job runner nodes. I am wondering if anyone has set up a config - Rundeck Linux Cluster with a Windows Job Runner. I was able to install the .jar and all that on the Windows node and it appears and is able to communicate via Runner Management.
Where I am stuck and it gets ambiguous his how to properly specify to use the job runner. This is my current setup:
job runner installed and is green in Runner Management and assigned to my project
IN Project config I have Runner selected as the Default Node Executioner
Default File Copier is also set to runner-file-copier
Under Project nodes
I setup a Node Wizard - here is the edited yaml:
mydomain:
nodename: nodename
hostname: jobrunnerhost.domain
osFamily: windows
node-executor: runner-node-exec
file-copier: runner-file-copier
Under the jobs I have it set to the appropriate node.
I am getting this error when I try and run anything either a simple DIR command or executing a basic powershell:
Execution failed: 28 in project Server.Validation.mynode: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [mynode: COPY_ERROR: Reason: FILE_COPIER_NOT_FOUNDUnable to find file copier: runner-file-copier]}, Node failures: {mynode=[COPY_ERROR: Reason: FILE_COPIER_NOT_FOUNDUnable to find file copier: runner-file-copier]}, status: failed]
I have tried setting up multiple ways. I feel as though I am missing a config or another step somewhere. Any help is appreciated.

Packer failed when executed on Gitlab-runner

I have a packer file to deploy Centos 7 using vSphere-Iso builder that works Ok when executed directly on a linux server but when I try to run the same packer file using a gitlab-runner it fails as it does not wait until the OS is installed. It fails after waiting for 1 minute but if I run the packer command with -on-error=run-cleanup-provisioner the OS install finish succesuflly so clear the issue is that packer is just not waiting.
2021/07/20 12:02:40 packer.io plugin: [INFO] Waiting for IP, up to total timeout: 30m0s, settle timeout: 5m0s
==> vsphere-iso.autogenerated_1: Waiting for IP...
==> vsphere-iso.autogenerated_1: Clear boot order...
==> vsphere-iso.autogenerated_1: Power off VM...
==> vsphere-iso.autogenerated_1: Destroying VM...
2021/07/20 12:03:12 [INFO] (telemetry) ending
==> Wait completed after 1 minute 2 seconds
2021/07/20 12:03:12 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
My boot command is the following as I do not use DHCP.
boot_command = ["<up><tab> text inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/vmware-ks.cfg ip=10.118.12.117::10.118.12.1:255.255.255.0:{{ .Name }}.localhost:ens192:none<enter><wait>"]
I have tested using options like ssh_host, ip_wait_address, ip_settle_timeout, ssh_wait_timeout, pause_before_connecting but nothing seems to work.
As I said, the same packer pkr.hcl file works OK if run it manually on a regular linux but not on my gitlab-runner that is a runner installed directly on my Gitlab server (Yes, I know is not the best practice but I only use the runner for this task)
Packer versions 1.7.2 and 1.7.3 tested, gitlab-runner 14.0.0 and 14.0.1 tested.
Managed to make it work by changing the las wait on my boot command for wait5m. This will give the OS enough time to get installed and the VM rebooted.
New boot command boot_command = ["<up><tab> text inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/vmware-ks.cfg ip=10.118.12.117::10.118.12.1:255.255.255.0:{{ .Name }}.localhost:ens192:none<enter><wait5m>"]
All the other wait options from packer are no longer needed with this boot command.
Doing some test I managed to make it work as well by creating a Firewall drop rule for the VM just after the kickstar file was loaded and removing the FW rules once the OS was installed. Definitelly, packer is just ignoring all the wait machanism native to packer when running on the gitlab-runner
EDIT: After having the same issue with my Windows Templates y tested using a different gitlab-runner installed on a different server instead of the one in the same gitlab server and it worked perfectly with my initial contifiguration for both, windows and centos.

Deploy Nodejs apllication on azure linux vm using azure release pipeline

I am creating CI & CD pipeline for nodejs application using azure devops.
I deployed build code to azure linux vm using azure release pipeline, here I configured deployment group job.
In deployment groups I used extract files task to unzip the build files.
Unzip will works fine and my code also deployed in this path: $(System.DefaultWorkingDirectory)/LearnCab-Manage(V1.5)-CI (1)/coreservices/ *.zip
After that i would like to run the pm2 command using azure release pipeline, for this task i take bash in deployment group jobs and write the command
cd $(System.DefaultWorkingDirectory)/LearnCab-Manage(V1.5)-CI (1)/coreservices/*.zip
cd coreservices
pm2 start server.js
But bash not executed it will give exit code 2.
it will give exit code 2
This error caused by your argument are using parentheses ( in the command at your first line. As usual, the parentheses is used as group. This could not be compiled as a normal character in command line.
To solve it, you need transfer the parentheses as a normal character with \:
cd $(System.DefaultWorkingDirectory)/LearnCab-Manage\(V1.5\)-CI \(1\)/coreservices/*.zip
And now, \(V1.5\) and \(1\) could be translated into (V1.5) and (1) normally.
And also, you can use single or double quote to around the path:
cd "$(System.DefaultWorkingDirectory)/LearnCab-Manage(V1.5)-CI (1)/coreservices/*.zip"
Or
cd '$(System.DefaultWorkingDirectory)/LearnCab-Manage(V1.5)-CI (1)/coreservices/*.zip'

VSTS - Build a Docker Image

I have a .NET Core repo in VSTS. I'm trying to create a Build pipeline that builds a Docker image and adds it to my Azure Container Registry. My Build pipeline has a Docker task. This task has the "Build an image" action selected. This action relies on my Dockerfile, which looks like this:
FROM microsoft/dotnet:2.1.2-runtime-nanoserver-1803
# Install .NET Core
ENV DOTNET_VERSION 2.1.2
When my Build pipeline runs, I get an error that says:
failed to register layer: re-exec error: exit status 1: output: ProcessUtilityVMImage \\?\C:\ProgramData\docker\windowsfilter\82aba535faccd8bf0e5ce3c122247672fa671214000a12c5481972212c5e2ca0\UtilityVM: The system cannot find the path specified.
##[error]C:\Program Files\Docker\docker.exe failed with return code: 1
Why am I getting this error? How do I fix it?
It should be the same issue with this one : https://github.com/Microsoft/vsts-tasks/issues/6510
Seems it still have some issues with nanoserver-1803
Just try to setup and host a custom agent on Azure VM, then check it again.
https://github.com/Microsoft/vsts-tasks/issues/6510#issuecomment-370152300
I found maybe an explication about this error: VSTS agents seem not
support nanoserver-1709 actually. Maybe this will change with the next
version 1803.
See details here: Microsoft/vsts-agent#1393
When I setup and host a custom agent on a machine on Azure, it's
working. So it's not a bug with this task. I close this issue. Thanks!

TeamCity ".Net Process Runner" hangs

We have started migrating our one of several projects to team city as part of CI. Below is how we have setup teamcity build. We are trying to deploy WebSite.
1) Build Step 1 (Package installation)
Using "command line " runner type install required package.
2) Build Step 2 (Build)
Using Runner type "Visual Studio (sln)" (Visual Studio 2010) build website.
3) Build Step 3 (Deploy Web Site)
Using ".Net Process Runner", deployer.exe (x86 built with .Net Framework 4) deploy site.
Deployer.exe reads config file. Config file contains "BuildId", "Environment" and "Servers" where we want build to be pushed.
<buildType id="bt52">
<env name="Debug">
<server path="SERVER1" />
</env>
<env name="QA">
<server path="SERVER2" />
<server path="SERVER3" />
</env>
<env name="UAT">
<server path="SERVER4" />
<server path="SERVER5" />
</env>
</buildType>
Deployer.exe is called with required parameters as below. Which reads config and deploys site to Server2 and Server3.
Deployer.exe "bt52" "QA" "siteQA" "E:\BuildAgent\work\2483052e33e5e1e8\src\diy\" msdeploy.exe
Problem area is step #3.
When we run deployer.exe using .Net process runner as part of team city we see its hanging and not responsind sometime even for 45 minutes. When we try to execute same deployer.exe from build server using command line script executes within couple of seconds.
E:\TeamCity_custom_applications\deployer>Deployer.exe farm1-1 QA siteQA E:\BuildAgent\work\2483052e33e5e1e8\src\diy\ msdeploy.exe
Info
: Processing batch run ... Info : Processing command ...msdeploy.exe
-verb:sync -source:contentPath="E:\BuildAgent\work\2483052e33e5e1e8\src\diy\" -dest:contentPath="siteQA",wmsvc="SERVER2",userName="*****",password="******",authType="Basic"-skip:objectName=filePath,absolutePath=web.config -skip:objectName=dirPath,absolutePath="bin" -enableRule:DoNotDeleteRule -allowUntrusted Info : output >>Total changes: 0 (0 added, 0 deleted, 0 updated, 0 parameters changed, 0
bytes copied) Info : error >>(none) Info : ExitCode >> 0 Info :
Processing command ...msdeploy.exe -verb:sync
-source:contentPath="E:\BuildAgent\work\2483052e33e5e1e8\src\diy\" -dest:contentPath="siteQA",wmsvc="SERVER3",userName="******",password="******",authType="Basic"
-skip:objectName=filePath,absolutePath=web.config -skip:objectName=dirPath,absolutePath="bin" -enableRule:DoNotDeleteRule -allowUntrusted Info : output >>Total changes: 0 (0 added, 0 deleted, 0 updated, 0 parameters ch anged, 0
bytes copied) Info : error >>(none) Info : ExitCode >> 0
Info: Deploy Script Complete.
One more thing we observed is running deployer.exe through teamcity I see that site content gets copied but only for 1 server and teamcity build status stays in "Running" mode. I am wondering if someone can please put little bit of insight on how can I look into this issue.
Update 1:
Thanks for your time looking into it !! What we ended up doing is, Instead of running command "msdeploy.exe" from "cmd.exe" we added "msdeploy.exe" location as Environment variable and executed "msdeploy.exe" in loop for # of servers. This has resolved issue of hanging. Now I am just curious to know why would it behave in such manner where if you execute "msdeploy.exe" from "cmd.exe" it would hang while running directly "msdeploy.exe" it would execute successfully. Any insight into same would be greatly appreciated.
Update 2:
I have added image which explains behavior using process explorer. If we kill msdeploy.exe from process explorer than for next all deployments to that server will not have the issue of build hanging. Please see below image
To be honest, it sounds like you're running into issues with redirecting input/output streams. TeamCity is running your application in a totally headless environment and then you, in turn, are attempting to redirect and parse the output of msdeploy.exe
If that's the case, I'd recommend looking into using the MSDeploy API instead of msdeploy.exe. The latter is just a command line wrapper for the former, so all the functionality is available to you. There's a sample deployment application available on the IIS blog if you need help getting started.
It seems you have NUnit build step configured in TeamCity and invoke cmd.exe from your test. This looks like an issue with the test code then. Most probably it will reproduce without TeamCity if you run the test in question with NUnit directly.
As Richard noted, most probably the issue root cause is related to stdin/stdout processing.
If you want to fix it in your code, you can try to experiment by explicitly closing stdin or the other way around, try writing something into it, etc.
Work around we did is, we observed msdeploy doesn't take more than 3-5 seconds to execute and deploy (Even for our biggest project which is almost 300mb website). So we set timeout of 20 seconds. So far since last 1 weeks we have not seen any issue with it and hopefully it will not cause more trouble but still we are not sure why such behavior.

Resources