Keep Azure self-hosted agents running and connected - azure

I have a few questions regarding Azure self-hosting.
Assume a Windows self-hosted agent is set up on a physical machine M by user Alex. The agent goes offline when Alex logs off and the machine goes to sleep. Now, when Bob logs into the same machine, he has to set up a different agent while the agent set up by Alex is still offline and not accessible by Bob. (Please let me know if I got something wrong here)
Is it possible to set up self-hosted agents in a way such that all users can access the same agent,
and how can we avoid the issue of the agent going offline when the machine goes to sleep? I tried running the agent both interactively and as a service.
We do have a Linux cluster running so we can avoid the issue of the machine going to sleep, but accessing the agent is still a concern. Also, we only have physical machines in our lab to run Windows and macOS, and users have to log out after using them.
Any help would be greatly appreciated!

If on a Windows server that is configured to not go to sleep, create the agent and run as a service. I'd recommend creating the agent to run as a domain service account created just for the agent. Logging off the remote server shouldn't impact the state of the agent.
If you run as a service, the agent can not execute UI-automation. If you need UI-automation to execute on the agent, you will need to run it as an interactive agent. I would still run interactively as a domain service account. If someone were to remote into the box with a different account and it is running interactively, the agent would show up as offline. You would need to either restart that server or log in with that agent account then disconnect correctly.
We leverage the batch script provided in the Microsoft documentation for disconnecting without impacting the interactive agent.
for /f "skip=1 tokens=3" %%s in ('query user %USERNAME%') do (
%windir%\System32\tscon.exe %%s /dest:console
)

Related

How do I connect Release Management 2013 client on a non-domain Windows 10 box?

I've got 2 machines:
A corporate desktop machine which is running Windows 7 SP1 which resides on the corporate domain and which I log into using a corporate domain account.
A personal laptop that I use when working from home via the Cisco VPN client but presently sits on my desk connected to the corporate WiFi (though I had it connected to the wire and on the same subnet as my desktop machine today also). This machine is not on the corporate domain; I log into this machine with a Microsoft Account.
I need to run Visual Studio 2013 Release Management Client from both machines. The machine on my desktop works fine when entering either the IP address or the URL into the Release Management Server URL entry field and everything hooks up and all is glorious.
On my Windows 10 laptop however, it's a different story. Every attempt to connect is met with the error:
The server specified could not be reached. Please ensure the
information that is entered is valid (please contact your Release
Management administrator for assistance). <-- I'm the admin
I can ping the machine both with IP address and with hostname, ruling out DNS issues. Both client machines are on the same subnet. Both machines are using the same outbound port.
Checking the event log I see a bunch of Message: The remote server returned an error: (401) Unauthorized.
Checking with Fiddler, on my desktop machine, I can walk through the handshake of each of the stages of startup and all is good. But in Fiddler on my laptop I see 3 401 Unauthorized errors before Release Management Client bombs and returns the rather uninformative message I posted above.
I've attempted to create a shadow account on my laptop and do the Shift-Right Click-Run As Different User dance, but I must be missing something because I can't get this to run.
I've talked to the network administrator who suggests that I should be able to access all of the same resources from both machines and that it must be a Release Management issue.
Is this an incompatibility between VS2013 Release Management & Windows 10 or something else? Has anyone else had this issue and overcome it? I have access to be able to administer the Release Management environment if there's changes that need to be made there and I'm a local administrator on both machines. I'm not however a domain administrator if changes need to be made there.
I would bet you simply have a security issue as the workstation is not domain-joined and the WPF client is using Integrated Authentication.
Often creating a local "shadow" user with same username and password, and running the client app under that account (run as) works.
Another option is to join the workstation to the domain or use a domain-joined VM.
After fully investigating the situation, it appears to have been a combination of factors. I am posting a response because this appears to be a relatively common problem:
The workstation was sending an unexpected credential to the server. To get around this, you have to configure the user account on the server without a domain in the username and create a shadow account on your local machine. When running the client application, you must either log into this shadow account on the local machine or you must SHIFT+RIGHT CLICK and choose "Run as" entering your local shadow credentials. This will then pass the shadow account to the server which will now authenticate without referencing the domain. OR
Create a user account on the server that matches the credentials on your local machine including MACHINENAME\LocalUsername
There appeared to be a network issue when attempting to connect to the Release Management Server from the non-domain machine when connected inside the network. When connecting via the VPN from home, this situation was resolved, but only after we'd ensured the account and local machine accounts were correctly configured. The domain connected machine always connected properly.

Azure VM - why FTP transfers lead to complete disconnect?

I have a virtual machine with the FTP server configured.
I'm transferring files in ACTIVE mode and at a random file I get disconnected.
I cannot reconnect to the FTP server nor connect remotely to the machine.
I have to restart the machine and wait a while to regain access.
What can I do in this situation to prevent the complete disconnect?
I ended up using the Passive mode, even though it does not suit me because the Active mode kept failing.
You need more than just those two ports open - the design of FTP (either passive or active) is that the FTP server will send data back on a randomised range of ports (see: http://slacksite.com/other/ftp.html) which presents a problem when using a stateless service like Azure's Load Balancing that requires Endpoints that must be explicitly opened. This setup guide is best to see how to achieve what you want on an Azure VM: http://itq.nl/walkthrough-hosting-ftp-on-iis-7-5-a-windows-azure-vm-2/ (and is linked from the SO post referenced by Grady).
You most likely need to open the FTP Endpoint on the VM: This answer will give you some backgroudn you how to add endpoints: How to Setup FTP on Azure VM
You can also use powershell to add endpoint: Add Azure Endpoint

Active FTP on Azure virtual machine

I have setup FTP in IIS 8.0 on an Azure windows server 2012 virtual machine.
After followed the instructions in this post (http://itq.nl/walkthrough-hosting-ftp-on-iis-7-5-a-windows-azure-vm-2/) I've been able to make FTP work fine in passive mode, but it fails when trying to connect in active mode from FileZilla. FTP client can connect to the server in active mode but fails with timeout error message when trying to execute LIST command.
I carefully revised 20 and 21 endpoints are set in azure vm without pointing to a probe port and that windows firewall allows external connections to 20 and 21 VM ports.
I can't figure out why active mode doesn't work while passive mode works fine.
I know there are other users with some issue.
Please is there someone who had succed setting active ftp in azure VM?.
This previous response is incorrect. https://stackoverflow.com/a/20132312/5347085
I know this because I worked with Azure support extensively. The issue has nothing to do with the server not being able to connect to the client, and my testing method eliminated client side issues as a possibility.
After working with Azure support for 2 weeks, their assessment of the problem was essentially that “Active Mode FTP uses a series of random ports from a large range for the data channel from the client to the server. You can only add 150 endpoints to an Azure VM so you couldn’t possibly add all those ports and get Active FTP working 100%. In order to do this you would need to use “Instance level public IP” and essentially bypass the endpoint mechanism all together and put your VM directly on the internet and rely entirely on the native OS firewall for protection.
If you HAVE to use Active Mode FTP on Azure and are ok with putting your VM on a public IP, he provided this link:
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-instance-level-public-ip/
UPDATE: Official response from Azure Support:
Josh,
First of all thanks with your patience on this. As I mentioned in my
last email I was working with our Technical Advisors which are Support
Escalation Engineers on reproducing this environment in Azure. Our
tests were configured using WS_FTP 7.7 (Your version 7.1) and WS_FTP
12 client as well as the Windows FTP client. The results of our
testing were the same as you are experiencing. We were able to log in
to the server, but we get the same Command Port/List failures.
As we discussed previously Active FTP uses a random port for the data
plane on the client side. The server connects via 21 and 20, but the
incoming port is a random ephemeral port. In Passive FTP, this can
be defined and therefore endpoints can be created for each port you
use for part of the data plane.
Based on our extensive testing yesterday I would not expect any other Active FTP solution to work. The escalation Engineer that
assisted yesterday also discussed this with other members of his team
and they have not seen any successful Active FTP deployments in Azure.
In conclusion, my initial thoughts have been confirmed with our
testing and Active FTP will not work in the Azure environment at
this time. We are always striving to improve Azure’s offering so
this may be something that will work in the future as we continue to
grow.
You will need to move to a passive FTP setup if you are going to host
this FTP server in Azure.
When using active ftp, the client initiates the connection to port 21 on the FTP server. This is the command or control channel and this connection usually succeeds. However, the FTP server then attempts to open port 20 on the client. This is the data channel. This channel is used for all data transfers, including directory listings.
So, in your case, active FTP isn't working because the server can't initiate a connection to the client. This is either a problem on the server (outbound firewall rule) or on the client itself. This is usually a good thing because you don't want internet-based servers to be able to open connections on client machines.
In passive mode there is a clear client/server distinction where the client initiates connections to the server. Passive mode is recommended so if you got that working I'd stick with that.

How to restrict user access to only deployments via Capistrano? (deployment workflow issue)

I look for good practices for deploying with capistrano.
I would like to start out with a short description how I used to do deployment.
capistrano is installed locally on a developer's computer. I deploy thought gateway with capistrano option :gateway. Firstly, I thought that with :gateway option I need to have ssh connection only to gateway host, but it turns out that I need ssh connection (public key) to all hosts where I want to deploy to.
I would like to find a convenient and secure way to deploy application.
For example, in case when new developer starts working, is much more convinient to put his *public_key* only on gateway server and not on all applications servers. On the other hand I don't want him to have any connection to servers in particular ssh to gateway, just because he is developer, he needs to do only deployments.
If you are aware of good practices for deploying with capistrano, please, let me know.
Create special user accounts for every developer on the gateway machine as well as on the rest of the server machines. This you will have to do using the abilities your OS and ssh gives you. Make the developer accounts don't have the ability to login via a shell to the gateway etc.
I can't provide you with all the details, but I think I might have directed you in the right direction. You can ask on Server Fault for the details how it is possible to allow an user to login and do only certain tasks on the server.
Digression/Opinion: It's better to have developers which you trust to do the deployments. If you do not trust a dev, better do not let him do crucial things like i.e. deployment to a production server.

Managing inter instance access on EC2

We are in the process of setting up our IT infrastructure on Amazon EC2.
Assume a setup along the lines of:
X production servers
Y staging servers
Log collation and Monitoring Server
Build Server
Obviously we have a need to have various servers talk to each other. A new build needs to be scp'd over to a staging server. The Log collator needs to pull logs from production servers. We are quickly realizing we are running into trouble managing access keys. Each server has its own key pair and possibly its own security group. We are ending up copying *.pem files over from server to server kind of making a mockery of security. The build server has the access keys of the staging servers in order to connect via ssh and push a new build. The staging servers similarly has access keys of the production instances (gulp!)
I did some extensive searching on the net but couldnt really find anyone talking about a sensible way to manage this issue. How are people with a setup similar to ours handling this issue? We know our current way of working is wrong. The question is - what is the right way ?
Appreciate your help!
Thanks
[Update]
Our situation is complicated by the fact that at least the build server needs to be accessible from an external server (specifically, github). We are using Jenkins and the post commit hook needs a publicly accessible URL. The bastion approach suggested by #rook fails in this situation.
A very good method of handling access to a collection of EC2 instances is using a Bastion Host.
All machines you use on EC2 should disallow SSH access to the open internet, except for the Bastion Host. Create a new security policy called "Bastion Host", and only allow port 22 incoming from the bastion to all other EC2 instances. All keys used by your EC2 collection are housed on the bastion host. Each user has their own account to the bastion host. These users should authenticate to the bastion using a password protected key file. Once they login they should have access to whatever keys they need to do their job. When someone is fired you remove their user account to the bastion. If a user copies keys from the bastion, it won't matter because they can't login unless they are first logged into the bastion.
Create two set of keypairs, one for your staging servers and one for your production servers. You can give you developers the staging keys and keep the production keys private.
I would put the new builds on to S3 and have a perl script running on the boxes to pull the lastest code from your S3 buckets and install them on to the respective servers. This way, you dont have to manually scp all the builds into it everytime. You can also automate this process using some sort of continuous build automation tools that would build and dump the build on to you S3 buckets respectively. Hope this helps..

Resources