Azure: Deprovision a linux instance using Custom Script Extension - linux

I am attempting to deprovision an Azure Linux instance using the Custom Script Extension.
My Script stored in an Anonymous Access Blob:
sudo waagent -deprovision+user -verbose -force
exit
My Command to apply the extension:
az vm extension set --resource-group rg01--vm-name vm01--name CustomScript --publisher Microsoft.Azure.Extensions --version 2.0 --settings "{'fileUris': [ 'https://mystorageaccount.blob.core.windows.net/scripts/Deprovision.sh' ], 'commandToExecute': 'sh Deprovision.sh'}"
When I run the az command, all the /var/log/azure sub directories and logs disappear!. I can tell from the bastion window, something tried to delete my user account, so I am confident the extension is getting provisioned and run.
Unfortunately, the extension, shows all status information as unavailable, and my az command just sits there. The "Create or Update Virtual Machine Extension" item in the VM's activity log also shows no activity once the deprovision starts. The Azure activity log suggest a restart occurred and my account is no longer valid.
Hopefully Linux/Azure folks have a recipe for this...
-
I saw similar behavior in Windows, and ended up using this script to Sysprep (the -Wait was critical, as it forced the Powershell process to wait for completion, preventing the Agent from returning success/fail until the process completed.), this prevents a restart. I can then script deallocation to occur when the extension completes. I suspect something similar is going on here.
Start-Process -FilePath C:\Windows\System32\Sysprep\Sysprep.exe -ArgumentList '/generalize /oobe /quiet /quit' -Wait

While the command:
sudo waagent -deprovision+user -verbose -force
works via SSH, when run via CustomScript extension this command basically kills everything on the machine. The CustomScript extension is not able to acknowledge completion.
Using this script:
sudo shutdown +3
sudo waagent -deprovision+user -verbose -force -start
Line 1, shuts the VM down in 3 minutes(deprovision command seems very fast)
Line 2, adding '-start', runs waagent as a background process. This allows CustomScript extension to acknowledge completion.
Now this command completes (instead of hangs):
var cmd = "sh Deprovision.sh";
var result = vm.Update()
.DefineNewExtension("deprovision")
.WithPublisher("Microsoft.Azure.Extensions")
.WithType("CustomScript")
.WithVersion("2.1")
.WithMinorVersionAutoUpgrade()
.WithPublicSetting("fileUris", new string[] { blobUri })
.WithPublicSetting("commandToExecute", cmd)
.Attach()
.Apply();
After completion, we must poll Azure for when the VM is Stopped.
WaitForVMToStop(vm);
private void WaitForVMToStop(IVirtualMachine newVM)
{
Context.Logger.LogInformation($"WaitForVMToStop...");
bool stopped = false;
int cnt = 0;
do
{
var stoppedVM = Context.AzureInstance.VirtualMachines.GetById(newVM.Id);
stopped = (stoppedVM.PowerState == PowerState.Stopped);
if (!stopped)
{
cnt++;
Context.Logger.LogInformation($"\tPolling 60 seconds for 'PowerState = Stopped on [{newVM.Name}]...");
System.Threading.Thread.Sleep(60000);
if (cnt > 20)
{
Context.Logger.LogInformation($"\tSysPrep Extension exceeded 20 minutes. Aborting...");
throw new Exception($"SysPrep Extension exceeded 20 minutes on [{newVM.Name}]");
}
}
} while (!stopped);
Context.Logger.LogInformation($"\tWaited {cnt} minutes for 'PowerState = Stopped...");
}
This seems way too complicated to me, but it works. I especially do not like assuming deprovision will occur in 3 minutes or less. If anybody has a better way, please share.

Related

Get exit code from `az vm run-command` in Azure pipeline

I'm running a rather hefty build in my Azure pipeline, which involves processing a large amount of data, and hence requires too much memory for my buildagent to handle. My approach is therefore to start up an linux VM, run the build there, and push up the resulting docker image to my container registry.
To achieve this, I'm using the Azure CLI task to issue commands to the VM (e.g. az vm start, az vm run-command ... etc).
The problem I am facing is that az vm run-command "succeeds" even if the script that you run on the VM returns a nonzero status code. For example, this "bad" vm script:
az vm run-command invoke -g <group> -n <vmName> --command-id RunShellScript --scripts "cd /nonexistent/path"
returns the following response:
{
"value": [
{
"code": "ProvisioningState/succeeded",
"displayStatus": "Provisioning succeeded",
"level": "Info",
"message": "Enable succeeded: \n[stdout]\n\n[stderr]\n/var/lib/waagent/run-command/download/87/script.sh: 1: cd: can't cd to /nonexistent/path\n",
"time": null
}
]
}
So, the command succeeds, presumably because it succeeded in executing the script on the VM. The fact that the script actually failed on the VM is buried in the response "message"
I would like my Azure pipeline task to fail if the script on the VM returns a nonzero status code. How would I achieve that?
One idea would be to parse the response (somehow) and search the text under stderr - but that sounds like a real hassle, and I'm not sure even how to "access" the response within the task.
Have you enabled the option "Fail on Standard Error" on the Azure CLI task? If not, you can try to enable it and run the pipeline again to see if the error "cd: can't cd to /nonexistent/path" can make the task run failed.
If the task still is passed, the error "cd: can't cd to /nonexistent/path" should not be a Standard Error. In this situation, you may need to add more command lines in your script to monitor the output logs of the az command. Once there is any output message shows error, execute "exit 1" to exit the script and return a Standard Error to make the task be failed.
I solved this by using the SSH pipeline task - this allowed me to connect to the VM via SSH, and run the given script on the machine "directly" via SSH.
This means from the context of the task, you get the status code from the script itself running on the VM. You also see any console output inside the task logs, which was obscured when using az vm run-command.
Here's an example:
- task: SSH#0
displayName: My VM script
timeoutInMinutes: 10
inputs:
sshEndpoint: <sshConnectionName>
runOptions: inline
inline: |
echo "Write your script here"
Not that the SSH connection needs to be set up as a service connection using the Azure pipelines UI. You reference the name of the service connection you set up in yaml.

AzCopy Sync command is failing

I'm issuing this command:
azcopy sync "D:\Releases\Test\MyApp" "http://server3:10000/devstoreaccount1/myapp?sv=2019-02-02&st=2020-06-24T03%3A19%3A44Z&se=2020-06-25T03%3A19%3A44Z&sr=c&sp=racwdl&sig=REDACTED"
...and I'm getting this error:
error parsing the input given by the user. Failed with error Unable to infer the source 'D:\Releases\Test\MyApp' / destination 'http://server3:10000/devstoreaccount1/myapp?sv=2019-02-02&st=2020-06-24T03%3A19%3A44Z&se=2020-06-25T03%3A19%3A44Z&sr=c&sp=racwdl&sig=-REDACTED-
I would have thought my source was pretty clear.
Can anyone see anything wrong with my syntax?
I believe you have run into an issue with azcopy that it does not support local emulator (at least for sync command). There's an open issue on Github for the same: https://github.com/Azure/azure-storage-azcopy/issues/554.
Basically the issue is coming from the following lines of code, where it returns location as Unknown in case of storage emulator URLs:
func inferArgumentLocation(arg string) common.Location {
if arg == pipeLocation {
return common.ELocation.Pipe()
}
if startsWith(arg, "http") {
// Let's try to parse the argument as a URL
u, err := url.Parse(arg)
// NOTE: sometimes, a local path can also be parsed as a url. To avoid thinking it's a URL, check Scheme, Host, and Path
if err == nil && u.Scheme != "" && u.Host != "" {
// Is the argument a URL to blob storage?
switch host := strings.ToLower(u.Host); true {
// Azure Stack does not have the core.windows.net
case strings.Contains(host, ".blob"):
return common.ELocation.Blob()
case strings.Contains(host, ".file"):
return common.ELocation.File()
case strings.Contains(host, ".dfs"):
return common.ELocation.BlobFS()
case strings.Contains(host, benchmarkSourceHost):
return common.ELocation.Benchmark()
// enable targeting an emulator/stack
case IPv4Regex.MatchString(host):
return common.ELocation.Unknown()//This is what gets returned in case of storage emulator URL.
}
if common.IsS3URL(*u) {
return common.ELocation.S3()
}
}
}
return common.ELocation.Local()
}
I had this same issue when trying to do a sync from my local machine to Azure Blob storage.
This was the command I was running:
azcopy sync "C:\AzureStorageTest\my-app\*" "https://myapptest.z16.web.core.windows.net/$web"
But I got the error below:
INFO: The parameters you supplied were Source: 'c:\AzureStorageTest\my-app' of type Local, and Destination: 'https://
myapptest.z16.web.core.windows.net/' of type Local
INFO: Based on the parameters supplied, a valid source-destination combination could not automatically be found. Please
check the parameters you supplied. If they are correct, please specify an exact source and destination type using the -
-from-to switch. Valid values are two-word phases of the form BlobLocal, LocalBlob etc. Use the word 'Blob' for Blob St
orage, 'Local' for the local file system, 'File' for Azure Files, and 'BlobFS' for ADLS Gen2. If you need a combination
that is not supported yet, please log an issue on the AzCopy GitHub issues list.
error parsing the input given by the user. Failed with error Unable to infer the source 'C:\AzureStorageTest\my-app'
/ destination 'https://myapptest.z16.web.core.windows.net.z16.web.core.windows.net/'.
PS C:\Users\promise> azcopy sync "C:\AzureStorageTest\my-app" "https://myapptest.z16.web.core.windows.net.z16.web.core.window
s.net/$web" -from-to localblob
Error: unknown shorthand flag: 'f' in -from-to
Here's how I solved it:
I was missing the ?[SAS] argument at the end of the Blob storage location. Also, as of this writing, the azccopy sync command does not seem to support the --from-to switch. So instead of this:
azcopy sync "C:\AzureStorageTest\my-app" "https://myapptest.z16.web.core.windows.net/$web"
I had this:
azcopy sync "C:\AzureStorageTest\my-app" "https://myapptest.blob.core.windows.net/%24web?[SAS]" --recursive
Note:
The format is azcopy sync "/path/to/dir" "https://[account].blob.core.windows.net/[container]/[path/to/directory]?[SAS]" --recursive. You only need to modify the "/path/to/dir", [account] and [container]/[path/to/directory]. Every other thing remains the way it is.
My actual Blob storage location is https://myapptest.blob.core.windows.net/$24web but I used https://myapptest.blob.core.windows.net/%24web, since $ will throw some error when used, so %24 was used.
Don't use the blob website address like I did which is https://myapptest.z16.web.core.windows.net/ that got me frustrated with lots of errrors, rather use the blob storage address https://myapptest.blob.core.windows.net
The --recursive flag is already inferred in this directory sync operation, so you may consider leaving it out. I put it though for clarity sake.
That's all.
I hope this helps
OK—after much futzing I was finally able to get this to work, using Azurite and PowerShell. It's clear that neither AzureCLI nor AzCopy are well-tested under emulation.
Here's a rough-and-tumble script that can be called from a pipeline:
[CmdletBinding()]
param(
[Parameter(Mandatory)][string] $Container,
[Parameter(Mandatory)][string] $Source
)
$Context = New-AzureStorageContext -Local
$BlobNames = Get-AzureStorageBlob -Context $Context -Container $Container | % { $_.Name }
$FilesToSync = gci $Source\* -Include RELEASES, Setup.exe
$Packages = gci $Source -Filter *.nupkg
$Packages | % {
If (!($BlobNames.Contains($_.Name))) {
$FilesToSync += $_
}
}
$FilesToSync | Set-AzureStorageBlobContent -Context $Context -Container $Container -Force
Note that this is highly customized for my Squirrel deployments (*.nupkg, RELEASES, Setup.exe), so a person will want to adjust accordingly for his own environment.
Azurite can be set to always-on using a Scheduled Task to run this command every hour:
powershell -command "Start-Process azurite-blob.cmd -PassThru -ArgumentList '--blobHost 0.0.0.0'"
The argument sets Azurite to listen on any IP so that it can be reached from other computers on the network. I punched a hole in the firewall for ports 10000-10002.
Be careful to set the Task to run under the same account that was used to install Azurite, otherwise the Task won't be able to see azurite-blob.cmd (it's in %AppData%\npm, which is added to PATH during installation).
The command line needs the --recursive option. See https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json

How to automate Azure P2S VPN connection on Windows 10 with Jenkins pipeline

I have installed an Azure P2S VPN on my Windows computer and I can connect it manually. I also have a PowerShell script to do the job.
Here's the script:
rasphone "Azure-VPN"
$wshell = New-Object -ComObject wscript.shell;
$wshell.AppActivate('Network Connections')
Sleep 2
$wshell.SendKeys('~')
Sleep 2
$wshell.SendKeys('~')
The $wshell.SendKeys('~') is to replace pressing Enter key when I connect manually.
I can run this script to connect VPN successfully from command line:
> powershell C:\myScript.ps1
True
Now I want to run this script on a Jenkins pipeline. But it seems like this cannot be achieved.
stage('VPN'){
bat "powershell C:\\myScript.ps1"
}
It returns False on the Jenkins console output.
I also tried following the accepted answer here but still no luck (cannot run neither from command line nor on Jenkins)
> rasdial Azure-VPN /phonebook:%userprofile%\AppData\Roaming\Microsoft\Network\Connections\Cm\<aLongNumber>\<aLongNumber>.pbk
Remote Access error 623 - The system could not find the phone book entry for this connection.
Is there any workaround for this? My purpose is to use Jenkins pipeline to turn on the VPN, send some files over the network and then turn it off.
You could select to use Jenkin’s Powershell plugin for directly running Powershell scripts on Windows via Jenkins. You could get more references from this blog.
Alternatively, refer to this SO answer, you could invoke a batch file with Jenkins like this for Windows paths:
stage('build') {
dir("build_folder"){
bat "run_build_windows.bat"
}
}
or
stage('build') {
bat "c://some/folder/run_build_windows.bat"
}

Clearing Azure Redis Cache using PowerShell during deployment

When deploying new versions of our web application to Azure App Service, I have a requirement to clear out the data in the associated Azure Redis Cache. This is to ensure that we don't return old versions of items which have schema changes in the new version.
We're deploying using Octopus Deploy, and I have previously tried executing the following PowerShell command to Reset the cache:
Reset-AzureRmRedisCache -ResourceGroupName "$ResourceGroup" -Name "$PrimaryCacheName" -RebootType "AllNodes" -Force
This works successfully but it's a bit heavy-handed and we're having intermittent connection issues which I suspect are caused by the fact that we're rebooting Redis and dropping existing connections.
Ideally, I'd just like to execute a FLUSHALL command via PowerShell. Is this a better approach, and is it possible to execute in PowerShell using the StackExchange.Redis library?
The Reset-AzureRmRedisCache cmdlet restarts nodes of an Azure Redis Cache instance, which I agree it is a bit overkill for your requirement.
Yes, it is possible to execute a Redis FLUSHALL command in PowerShell.
As the pre-requisite, you should install the Redis CLI and set an environment variable to point to the Redis CLI executable/binary path in your environment.
Then, you can execute in PowerShell using the Redis-CLI commands as shown below.
Invoke-Command -ScriptBlock { redis-cli -h <hostname>.redis.cache.windows.net -p <redisPort> -a <password> }
Invoke-Command -ScriptBlock { redis-cli flushall }
A execution result of code sample above is as shown below:
The way I eventually implemented this is to call the StackExchange.Redis library via PowerShell, so you'll need to have a copy of this DLL somewhere handy. During my deployment, I have access to the connection string, so this function strips out the host and port to connect to the server. This works without the need to open the non-SSL port, and the connection string allows admin access to the cache:
function FlushCache($RedisConnString)
{
# Extract the Host/Port from the start of the connection string (ignore the remainder)
# e.g. MyUrl.net:6380,password=abc123,ssl=True,abortConnect=False
$hostAndPort = $RedisConnString.Substring(0, $RedisConnString.IndexOf(","))
# Split the Host and Port e.g. "MyUrl.net:6380" --> ["MyUrl.net", "6380"]
$RedisCacheHost, $RedisCachePort = $hostAndPort.split(':')
Write-Host "Flushing cache on host - $RedisCacheHost - Port $RedisCachePort" -ForegroundColor Yellow
# Add the Redis type from the assembly
$asm = [System.Reflection.Assembly]::LoadFile("StackExchange.Redis.dll")
# Open a connection
[object]$redis_cache = [StackExchange.Redis.ConnectionMultiplexer]::Connect("$RedisConnString,allowAdmin=true",$null)
# Flush the cache
$redisServer = $redis_cache.GetServer($RedisCacheHost, $RedisCachePort,$null)
$redisServer.FlushAllDatabases()
# Dispose connection
$redis_cache.Dispose()
Write-Host "Cache flush done" -ForegroundColor Yellow
}
I have used the Windows port of netcat to clear Redis cache remotely from my Windows machine, like so:
$redisCommands = "SELECT $redisDBIndex`r`nFLUSHDB`r`nQUIT`r`n"
$redisCommands | .\nc $redisServer 6379
Where $redisDBIndex is the Redis Cache index you want to clear. Or simply the command FLAUSHALL if you want to clear everything. $redisServer is your Redis server. And simply pipe to nc.
I have also documented it here: https://jaeyow.github.io/fullstack-developer/automate-redis-cache-flush-in-powershell/#

How to run Azure VM CustomScriptExtension as domain user? (part 2)

Updated to explain my root problem: If Azure has extensions for VM's, as they are being provisioned, to join a domain, and to run scripts, how can I run a script as a domain user?
The script needs to be run as a domain user in order to access a file share to retrieve installation files and other scripts that are neither part of the VM template image nor can (reasonably) be uploaded to Azure blob storage and downloaded as part of provisioning.
I split this question in two because the 2nd half (represented here) didn't get solved.
What I have working is a Powershell script that takes a JSON file to create a new VM; the JSON file contains instructions for the VM to join a domain and run a custom script. Both things do happen, but the script runs as the user workgroup\system and therefore doesn't have access to a network drive.
How can I best provide a specific user's credentials for such a script?
I'm trying to have the script spawn a new Powershell session with the credentials of a different user, but I'm having a hard time figuring out the syntax -- I can't even get it to work on my development workstation. Naturally, security is a concern but if I could get this to work using encrypted stored credentials, this might be acceptable.
... but don't limit your answers -- maybe there's an entirely different way to go about this and achieve the same effect?
Param(
[switch]$sudo, # Indicates we've already tried to elevate to admin
[switch]$su # Indicates we've already tried to switch to domain user
)
try {
# Pseudo-constants
$DevOrProd=(Get-Item $MyInvocation.MyCommand.Definition).Directory.Parent.Name
$PsScriptPath = Split-Path -parent $MyInvocation.MyCommand.Definition
$pathOnPDrive = "\\dkfile01\P\SoftwareTestData\Azure\automation\$DevOrProd\run-once"
$fileScriptLocal = $MyInvocation.MyCommand.Source
$fileScriptRemote = "$pathOnPDrive\run-once-from-netdrive.ps1"
# $filePw = "$pathOnPDrive\cred.txt"
$fileLog="$PsScriptPath\switch-user.log"
$Myuser="mohican"
$Myuserpass="alhambra"
$Mydomainuser="mydomain\$Myuser"
$Mydomain="mydomain.com"
# Check variables
write-output("SUDO=[$SUDO]")
write-output("SU=[$SU]")
# Functions
function Test-Admin {
$currentUser = New-Object Security.Principal.WindowsPrincipal $([Security.Principal.WindowsIdentity]::GetCurrent())
return ($currentUser.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator))
}
# Main
write-output("Run-once script starting ...")
# Check admin privilege
write-output("Checking admin privilege ...")
if (Test-Admin) {
write-output("- Is admin.")
} else {
write-output("- Not an admin.")
if ($sudo) {
write-output(" - Already tried elevating, didn't work.")
write-output("Run-once script on local VM finished.")
write-output("")
exit(0) # Don't return failure exit code because Azure will report it as if the deployment broke...
} else {
write-output(" - Attempting to elevate ...")
$arguments = "-noprofile -file $fileScriptLocal"
$arguments = $arguments +" -sudo"
try {
Start-Process powershell.exe -Verb RunAs -ArgumentList $arguments
write-output(" - New process started.")
} catch {
write-output(" - New process failed to start.")
}
write-output("Run-once script on local VM finished.")
write-output("")
exit(0) # The action will continue in the spawned process
}
}
write-output("Checked admin privilege ... [OK]")
# Check current user
write-output("Checking user account ...")
$hostname = $([Environment]::MachineName).tolower()
$domainname = $([Environment]::UserDomainName).tolower()
$thisuser = $([Environment]::UserName).tolower()
write-output("- Current user is ""$domainname\$thisuser"" on ""$hostname"".")
write-output("- Want to be user ""$Myuser"".")
if ($Myuser -eq $thisuser) {
write-output(" - Correct user.")
} else {
write-output(" - Incorrect user.")
if ($su) {
write-output(" - Already tried switching user, didn't work.")
write-output("Run-once script on local VM finished.")
write-output("")
exit(0) # Don't return failure exit code because Azure will report it as if the deployment broke...
} else {
write-output(" - Attempting to switch to user ""$Mydomainuser"" with passwond ""$Myuserpass"" ...")
# FIXME -- This does not work... :-(
$MyuserpassSecure = ConvertTo-SecureString $Myuserpass -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential $Mydomainuser, $MyuserpassSecure
$arguments = "-noprofile -file $fileScriptLocal"
$arguments = $arguments +" -sudo -su -Credential $credential -computername $hostname"
try {
Start-Process powershell.exe -Verb RunAs -ArgumentList $arguments
write-output(" - New process started.")
} catch {
write-output(" - New process failed to start.")
}
write-output("Run-once script on local VM finished.")
write-output("")
exit(0) # The action will continue in the spawned process
}
}
write-output("Checked user account ... [OK]")
# Run script from P: drive (finally!)
write-output("Attempting to run script from P: drive ...")
write-output("- Script file: ""$fileScriptRemote""")
if (test-path $fileScriptRemote) {
write-output("Running script from P: drive ...")
$arguments = "-noprofile -file $fileScriptRemote"
try {
Start-Process powershell.exe -Verb RunAs -ArgumentList $arguments
write-output(" - New process started.")
} catch {
write-output(" - New process failed to start.")
}
write-output("Run-once script on local VM finished.")
write-output("")
exit(0) # The action will continue in the spawned process
} else {
write-output("- Could not locate/access script file!")
write-output("Ran script from P: drive ... [ERROR]")
}
write-output("Run-once script on local VM finished.")
write-output("")
} catch {
write-warning("Unhandled error in line $($_.InvocationInfo.ScriptLineNumber): $($error[0])")
write-output("ABEND")
write-output("")
}
There's a couple of parts to this question as well!
Firstly getting credentials there, at some point you are going to need to pass a credential to the machine, even if it is a credential to obtain the credentials.
My personal solution is to create a certificate to encrypt a PSCredential object, store that object on a HTTP server, then pass the certificate and pfx password in the script. Of course if you're prebuilding the servers, you can preinstall this certificate. (there is a code review question with the code for this)
Alternatively you might be able to use something like Azure Key Vault to store the pfx password.
For the runas part. There are a few options
I've not launched Powershell as a different user since about v1! so I'll hope somebody else talks about that one.
You could run a scheduled task that logs in as a different user, this should work.
If you're running under a different context you can set the autologin properties, reboot the machine let its scripts run then delete the autologin entry and reboot again. This gives the added benefit that you can have a specific severely limited domain account that only has access to the shares you need it to, that has its admin / login rights stripped from each machine once it is built. This way you can also keep all of your build scripts in Active Directory and let that user automatically pull them down and run.

Resources