Azure DevOps Server on premises build agents are not responding - azure

I've upgraded Azure DevOps Server to latest version (2020 Update 1). After upgradation I'm frequently facing below error, everything looks good on build agent VM and agents itself are up and running.
*We stopped hearing from agent Agent1-NightlyBuild-tc3tbld1. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error.*

Related

Running Azure Devops Self-hosted agent on Linux VM

I had Installed ADO agent on SELinux enabled RHEL8 Azure VM.
Facing issues running the agent as Service but can Run in interactive mode.
Reached out to Microsoft support and they asked to disable SELinux which can't be done due to security issues.
Can someone please suggest the best secured configuration that can be used to run ADO agent on Linux VM

Azure Data Factory Integration Runtime Going Into Limited State

My team have created an IR in an on-premises VM and we are trying to create a Linked Service to an on-prem DB using that IR
Whenever we click on Test Connection in the Linked Service, the connection fails and IR goes into a limited state
We also whitelisted the IPs provided by Microsoft for IR ADF and also checked the network traces and all seems fine there
Also, we stopped and restarted the IR, uninstalled and installed it again but still the problem resists
Have anyone faced a similar kind of issue?
As this has been a long time we are facing this issue which has now become a blocker for us
Thanks!
This is observed when nodes can't communicate with each other.
You can Log in to the node-hosted virtual machine (VM). Go to Applications and Services Logs > Integration Runtime, open Event Viewer, and filter the error logs. If you find the error System.ServiceModel.EndpointNotFoundException or Cannot connect to worker manager
Follow the official documentation with detailed steps for Troubleshooting Error message: Self-hosted integration runtime node/logical self-hosted IR is in Inactive/ "Running (Limited)" state
As it states:
try one or both of the following methods to fix:
- Put all the nodes in the same domain.
- Add the IP to host mapping in all the hosted VM's host files.
I ran into same issue. Our organization has firewall rules preventing specific ports or url's from outside network. We added Data factory services tags with internet facing in Route table, and IR then connected successfully.

Keep Azure self-hosted agents running and connected

I have a few questions regarding Azure self-hosting.
Assume a Windows self-hosted agent is set up on a physical machine M by user Alex. The agent goes offline when Alex logs off and the machine goes to sleep. Now, when Bob logs into the same machine, he has to set up a different agent while the agent set up by Alex is still offline and not accessible by Bob. (Please let me know if I got something wrong here)
Is it possible to set up self-hosted agents in a way such that all users can access the same agent,
and how can we avoid the issue of the agent going offline when the machine goes to sleep? I tried running the agent both interactively and as a service.
We do have a Linux cluster running so we can avoid the issue of the machine going to sleep, but accessing the agent is still a concern. Also, we only have physical machines in our lab to run Windows and macOS, and users have to log out after using them.
Any help would be greatly appreciated!
If on a Windows server that is configured to not go to sleep, create the agent and run as a service. I'd recommend creating the agent to run as a domain service account created just for the agent. Logging off the remote server shouldn't impact the state of the agent.
If you run as a service, the agent can not execute UI-automation. If you need UI-automation to execute on the agent, you will need to run it as an interactive agent. I would still run interactively as a domain service account. If someone were to remote into the box with a different account and it is running interactively, the agent would show up as offline. You would need to either restart that server or log in with that agent account then disconnect correctly.
We leverage the batch script provided in the Microsoft documentation for disconnecting without impacting the interactive agent.
for /f "skip=1 tokens=3" %%s in ('query user %USERNAME%') do (
%windir%\System32\tscon.exe %%s /dest:console
)

Can we do script to auto restart for Services if the VM Agent status is not ready using Powershell?

This is my current scenario:
VM status is "Running", VMAgent is "Not Ready", Windows Azure Guest Agent service is "Stopped".
I will manually start Windows Azure Guest Agent whenever my VMAgent is not ready.
So is there any script I can write using Powershell to automatically Start the service whenever VMAgent is not ready ?
WMStatus
VMAgentStatus
Services
For your requirements, I'm afraid you cannot achieve it through a script to automatically start the service outside the VM. Please take a look at the description of the VM Agent:
The VM Agent has a primary role in enabling and executing Azure
virtual machine extensions. VM Extensions enable post-deployment
configuration of VM, such as installing and configuring software. VM
extensions also enable recovery features such as resetting the
administrative password of a VM. Without the Azure VM Agent, VM
extensions cannot be run.
In this situation, what you can do is interact with VM through the remote connection and manually start the services inside the VM. For more details about the VM Agent, see it here.

HTTP Request Timeout Windows Azure Deploy

I have an MVC 4 website using a WCF service. When I deploy to Windows Azure using the VS 2012 publish wizard, I get this error:
10:13:19 AM - The HTTP request to 'https://management.core.windows.net/42d4257b-5f38-400d-aac5-2e7acee9597d/services/hostedservices/myapp?embed-detail=true' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.
After cleaning the project and publishing a few times, the error goes away. What am I doing wrong?
Whenever you start publish process from VS machine, a SSL tunnel is established first and once the tunnel is created, the package is transferred from your machine to Windows Azure Portal first. After the upload is completed, you will see the result notifications are posted back to Publish result windows and that is how it happens.
In your case, the time to build the SSL tunnel doe secure package transfer is longer then normal, this could be because of network latency between your machine and the Windows Azure Management Portal. For security reason the time to create the tunnel smaller windows and if the connection is not created, the retry cycle starts the process again and even if that fails you are greeted with the failure message. This could be caused by excessive traffic on either side or both sides. So this is mainly a networking related issue rather then specific to Windows Azure as after some time successive tries, you could upload your package.
In such failure/situation, you can run network capture utilities i.e netmon, wireshark, and see the time taken during failure and success to see the different in various transfer. This will help you to understand the underlying delaying issues.
Try to update your roles diagnostics
like below
then update your storage credentials because it may be expired.

Resources