How to deploy a shell script to private linux vm - linux

I have a private Azure Linux VM means it can be accessed only from jumpbox(access) vm. I need to deploy a script to this private VM. As this VM cannot even access any storage account/repo, I can't use Custom Script Extension for script deployment. So I thought of deploying the script using az vm run-command invoke by converting the SomeScript.sh to strings and echo it to the virtual machine. Different pieces of my code as below:
SomeScript.sh
#!/bin/bash
#
# CommandToExecute: ./SomeScript.sh ${CUST_NO}
#
#some more code
Function that converts the .sh file to strings:
function getCommandToExecute()
{
local scriptName=$1
local commandToExecute
local currentLocation=$(dirname "$0")
local scriptFullPath="$currentLocation/Environment/VmScripts/$scriptName"
mapfile < $scriptFullPath
printf -v escapedContents "%q\n" "${MAPFILE[#]}"
commandToExecute+="echo "$escapedContents" > /usr/myapps/$scriptName"
echo "$commandToExecute"
}
vm run command:
az vm run-command invoke -g $resourceGroupName \
-n $vmName --command-id RunShellScript \
--scripts "#!/bin/bash\n ${commandToExecute}"
If I use the "#!/bin/bash\n ${commandToExecute}" part (commandToExecute replaced with string scripts) in the RunCommand window in azure portal, the script works fine, but I can't make it work via run command due to this exception:
\n[stdout]\n\n[stderr]\n/bin/sh: 1: /var/lib/waagent/run-command/download/133/script.sh: not found\n"
Any idea what is missing here? Or if there is a better alternative to handle this.

I think quoting the whole script and its deployment script for use with --scripts is a lot of work and error prone too. Luckily, there are easier alternatives for both quoting steps. The documentation of az vm run-command invoke --scripts states
Use #{file} to load script from a file.
Therefore, you can do
deploymentScript() {
echo '#! /usr/bin/env bash'
printf 'tail -n+4 "${BASH_SOURCE[0]}" > %q\n' "$2"
echo 'exit'
cat "$1"
}
deploymentScript local.sh remote.sh > tmpDeploy.sh
az vm run-command invoke ... --scripts '#{tmpDeploy.sh}'
rm tmpDeploy.sh
Replace local.sh with the path of the local script you want to deploy and replace remote.sh with the remote path where the script should be deployed.
If you are lucky, then you might not even need tmpDeploy.sh. Try
az vm ... --scripts "#{<(deploymentScript local.sh remote.sh)}"
Some notes on the implementation:
The deployed script is an exact copy of your script file. Even embedded binary data is kept. The trick with tail $BASH_SOURCE is inspired by this answer.
The script can be arbitrarily long. Even huge scripts won't run into the error Argument list too long imposed by getconf ARG_MAX.

There was issue with escaped string I mentioned above. After correcting the code, it all good now. My final code:
function getCommandToExecute()
{
local scriptName=$1
local commandToExecute
local currentLocation=$(dirname "$0")
local scriptFullPath="$currentLocation/Environment/VmScripts/$scriptName"
local singleStringCommand=""
mapfile -t < $scriptFullPath
for line in "${MAPFILE[#]}";
do
singleStringCommand+="$(printf '%s' "$line" | sed 's/[\"$]/\\&/g')"
singleStringCommand+="\n"
done
commandToExecute+="echo "\"$singleStringCommand\"" > /usr/local/bin/$scriptName;"
echo "$commandToExecute"
}

Related

Can an Azure Machine Learning Compute Instance shut down itself automatically with a bash script executed by crontab?

I have a compute instance that starts at 12:00 with the scheduler of Azure ML and does a job scheduled in the crontab of the CI at 12:10. The thing is that this job doesn't always takes the same time to finish. So i want the CI to shut down itself when done.
The script that the crontab executes is the following:
---------------------------------------------------------
#!/bin/bash
...
# CRREATE FOLDER FOR LOGS
foldername=$PROJECT_PATH/$(date '+%d_%m_%Y_%H_%M_%S')
mkdir $foldername
filename=az_login.txt
path=$foldername/$filename
touch $path
az login -u *<USERNAME>* -p *<PASSWORD>* > $path
filename=acr_login.txt
path=$foldername/$filename
touch $path
# Authenticate to ACR
az acr login --name $ACR_NAME > $path
filename=pull_container.txt
path=$foldername/$filename
touch $path
# Pull the container image from ACR
docker pull $ACR_NAME.azurecr.io/$IMAGE_NAME:$IMAGE_TAG > $path
filename=run.txt
path=$foldername/$filename
touch $path
# Run the container image
docker run -v $CREDENTIALS_PATH:/app/config_privilegies $ACR_NAME.azurecr.io/$IMAGE_NAME:$IMAGE_TAG > $path
filename=rm_container.txt
path=$foldername/$filename
touch $path
# Delete the exited containers
docker rm $(docker ps -a -q --filter "status=exited") > $path
az ml compute stop --name *<CI_NAME>* --resource-group *<RESOURCE_NAME>* --workspace-name *<WORKSPACE_NAME>* --subscription *<SUBSCRIPTION_NAME>*
Everything works great until the stop command. In this particular code, it does nothing.
I've tried to put the last command in a seperate bash script and changing the last line for "./close_ci.sh". However, this doesn't work either, it restarts the CI instead of stopping it.

How to catch cli.azure.cli.core.azclierror : ResourceNotFoundError: in PowerShell

When I try to fetch non-existent key from keyvault I get:
msrest.exceptions : (KeyNotFound) A key with (name/id) keyname
was not found in this key vault. If you recently deleted this key you
may be able to recover it using the correct recovery command. For help
resolving this issue, please see
https://go.microsoft.com/fwlink/?linkid=2125182
cli.azure.cli.core.azclierror : ResourceNotFoundError: (KeyNotFound) A
key with (name/id) keyname was not found in this key vault. If
you recently deleted this key you may be able to recover it using the
correct recovery command. For help resolving this issue, please see
https://go.microsoft.com/fwlink/?linkid=2125182
I expect this error, but only this error, so I don't want to create try-catch catching everything. However I cannot find full identifier of ResourceNotFound in the docs, by this I mean with a the namespace. Where I can find to to be able to catch this exception:
try {} catch [ResourceNotFoundError]{}
Az is not a PowerShell command, so I'm not sure try/catch would work at all.
What you could do is catch the output in a variable and then check that for the error before continuing.
Perhaps something like:
$GetKeyResult = az keyvault key show --name NoSuchKey --vault-name MyVault 2>&1
if ($GetKeyResult -like '*ResourceNotFoundError: (KeyNotFound)*') {
"Key wasn't found"
# Do stuff
}
The 2>&1 part is to redirect errors to standard output.
Another option is to skip the az commands and use a PowerShell CmdLet like Get-AzKeyVaultKey, unfortunately that doesn't error at all on invalid keynames, so you'd still need a check for it:
$GetKeyResult = Get-AzKeyVaultKey -VaultName MyVault -Name NoSuchKey
if ($null -eq $GetKeyResult) {
"Key wasn't found"
# Do stuff
}
In my case, I kept getting an error when running az sql db show when db did not exist. It appears that the syntax for the script varies between local Windows PC and pipeline. Perhaps I have a version mismatch. Anyhow, something so simple should not take this long nor be so difficult to figure out. This is not well documented online according to the references below.
Looks like the trick is on the local to add " 2>nul" and on the Azure CLI Pipeline " | ConvertFrom-Json". Note: using 2>nul on local actually created a file in the same directory where .sh script is, called nul.
Local machine code:
dbStatus=$(az sql db show -g myRGname -s myServerName -n myDBName --query "status" 2>nul)
if [[ $dbStatus == '"Online"' ]]; then
echo "It is online"
fi
Azure pipeline code:
$dbStatus1=$(az sql db show -g myRGname -s myServerName -n myDBName --query "status" | ConvertFrom-Json)
if($LastExitCode = "0")
{
$dbStatus2=$(az sql db show -g myRGname -s myServerName -n myDBName --query "status")
if ($dbStatus2 = "Online")
{
echo "It is online"
}
}
References:
https://devopsjournal.io/blog/2019/07/12/Azure-CLI-PowerShell
What does > nul 2>&1 mean in a batch statement
https://towardsdev.com/how-to-suppress-warnings-and-errors-messages-in-azure-cli-34cece53591c

Azure DevOps VM Scale Set Deployment in Linux VMs

I am trying to deploy in Azure Vm Scale Set using Run Custom Script VM extension on VM scale set in Azure DevOps Release Pipleine. I've a shell script which executes post deployment tasks.
In release pipeline I am using a storage account to archive artifacts and also unchecked the Skip Archiving custom scripts. In VMSS deployment task I am getting the following error:
2020-03-06T22:59:44.7864691Z ##[error]Failed to install VM custom script extension on VMSS.
Error: VM has reported a failure when processing extension 'AzureVmssDeploymentTask'.
Error message: "Enable failed: failed to execute command: command terminated with exit status=126
[stdout]
extracting archive cs.tar.gz
Invoking command: ./"main.sh"
[stderr]
./customScriptInvoker.sh: line 12: ./main.sh: Permission denied
I found the customScriptInvoker.sh under /var/lib/waagent/custom-script/download/1 directory in scale set vm
#!/bin/bash
if [ -n "$1" ]; then
mkdir a
echo "extracting archive $1"
tar -xzC ./a -f $1
cd ./a
fi
command=$2" "$3
echo $command
echo "Invoking command: "$command
eval $command
What should be the way around of this issue?
it seems like a shell execute permission is missing
I am assuming you are running the bash script from the same folder
I would try
chmod +rx main.sh
you can verify permissions
ls -l main.sh
Can you also post the ownership of the main.sh and customScriptInvoker.sh using
ls -la main.sh
ls -la customScriptInvoker.sh
Check if they are owned by different accounts that may not be in the same group. If that is the case, you would also get a permission denied error when trying to execute the main.sh script from inside other script. If that is the case, you can use the chgrp command to change the main.sh file to be owned by the same group as the other file. You can also use chown to change the ownership of the main.sh file to be have the same ownership as the other file. Its hard to tell without seeing the permissions and ownership of the files.

Azure ACI container deployment fails when launching with Command line arguments

I am trying to launch an ACI container from Azure CLI. The deployment fails when I send multiple commands from command-line and it succeeds when I just pass one command, like 'ls'.
Am I passing multiple arguments to the command line in the wrong way?
az container create --resource-group rg-***-Prod-Whse-Containers --name test--image alpine:3.5 --command-line "apt-get update && apt-get install -y && wget wget https://www.google.com" --restart-policy never --vnet vnet-**-eastus2 --subnet **-ACI
Unfortunately, it seems that you cannot run multi-command in one time. See the Restrictions of the exec command for ACI:
Azure Container Instances currently supports launching a single process with az container exec, and you cannot pass command arguments. For example, you cannot chain commands like in sh -c "echo FOO && echo BAR", or execute echo FOO.
You just can execute the command such as whoami, ls and etc in the CLI command.
I suggest that you can run the command to create an interactive session with the container instance to execute command continuously after you create the ACI, refer to this similar issue.

User-data scripts is not running on my custom AMI, but working in standard Amazon linux

I searched a lot of topic about "user-data script is not working" in these few days, but until now, I haven't gotten any idea about my case yet, please help me to figure out what happened, thanks a lot!
According to AWS User-data explanation:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So I tried to pass my own user-data when instance launch, this is my user-data:
\#!/bin/bash
echo 'test' > /home/ec2-user/user-script-output.txt
But there is no file in this path: /home/ec2-user/user-script-output.txt
I checked /var/lib/cloud/instance/user-data.txt, the file is exist and same as my user-data script.
Also I checked the log in /var/log/cloud-init.log, there is no error message.
But the user-data script is working if I launch an new instance with Amazon linux(2014.09.01), but I'm not sure what difference between my AMI (based on Amazon linux) and Amazon linux.
The only different part I saw is if I run this script:
sudo yum list installed | grep cloud-init
My AMI:
cloud-init.noarch 0.7.2-8.33.amzn1 #amzn-main
Amazon linux:
cloud-init.noarch 0.7.2-8.33.amzn1 installed
I'm not sure this is the reason?
If you need more information, I'm glad to provide, please let me know what happened in my own AMI and how to fix it?
many thanks
Update
Just found an answer from this post,
If I add #cloud-boothook in the top of user-data file, it works!
#cloud-boothook
#!/bin/bash
echo 'test' > /home/ec2-user/user-script-output.txt
But still not sure why.
User_data is run only at the first start up. As your image is a custom one, I suppose it have already been started once and so user_data is desactivated.
For windows, it can be done by checking a box in Ec2 Services Properties. I'm looking at the moment how to do that in an automated way at the end of the custom image creation.
For linux, I suppose the mechanism is the same, and user_data needs to be re-activated on your custom image.
The #cloud-boothook make it works because it changes the script from a user_data mechanism to a cloud-boothook one that runs on each start.
EDIT :
Here is the code to reactivate start on windows using powershell:
$configFile = "C:\\Program Files\\Amazon\\Ec2ConfigService\\Settings\\Config.xml"
[xml] $xdoc = get-content $configFile
$xdoc.SelectNodes("//Plugin") |?{ $_.Name -eq "Ec2HandleUserData"} |%{ $_.State = "Enabled" }
$xdoc.SelectNodes("//Plugin") |?{ $_.Name -eq "Ec2SetComputerName"} |%{ $_.State = "Enabled" }
$xdoc.OuterXml | Out-File -Encoding UTF8 $configFile
$configFile = "C:\\Program Files\\Amazon\\Ec2ConfigService\\Settings\\BundleConfig.xml"
[xml] $xdoc = get-content $configFile
$xdoc.SelectNodes("//Property") |?{ $_.Name -eq "AutoSysprep"} |%{ $_.Value = "Yes" }
$xdoc.OuterXml | Out-File -Encoding UTF8 $configFile
(I know the question focus linux, but it could help others ...)
As I tested, there were some bootstrap data in /var/lib/cloud directory.
After I cleared that directory, User Data script worked normally.
rm -rf /var/lib/cloud/*
I have also faced the same issue on Ubuntu 16.04 hvm AMI. I have raised the issue to AWS support but still I couldn't find exact reason/bug which affects it.
But still I have something which might help you.
Before taking AMI remove /var/lib/cloud directory (each time). Then while creating Image, set it to no-reboot.
If these things still ain't working, you can test it further by forcing user-data to run manually. Also tailf /var/log/cloud-init-output.log for cloud-init status. It should end with something like modules:final to make your user-data run. It should not stuck on modules:config.
sudo rm -rf /var/lib/cloud/*
sudo cloud-init init
sudo cloud-init modules -m final
I don't have much idea whether above commands will work on CentOS or not. I have tested it on Ubuntu.
In my case, I have also tried removing /var/lib/cloud directory, but still it failed to execute user-data in our scenario. But I have came up with different solution for it. What we have did is we have created script with above commands and made that script to run while system boots.
I have added below line in /etc/rc.local to make it happen.
sudo bash /home/ubuntu/force-user-data.sh || exit 1
But here is the catch, it will execute the script on each boot so which will make your user-data to run on every single boot, just like #cloud-boothook. No worries, you can just tweak it by just removing the force-user-data.sh itself at the end. So your force-user-data.sh will look something like
#!/bin/bash
sudo rm -rf /var/lib/cloud/*
sudo cloud-init init
sudo cloud-init modules -m final
sudo rm -f /home/ubuntu/force-user-data.sh
exit 0
I will appreciate if someone can put some lights on why it is unable to execute the user-data.
this is the answer as an example: ensure that you have in the headline only #!/bin/bash
#!/bin/bash
yum update -y
yum install httpd mod_ssl
service httpd start
chkconfig httpd on
I was having a lot of trouble with this. I'll provide detailed walk-though.
My added wrinkle is that I'm using terraform to instantiate the hosts via a launch configuration and autoscaling group.
I could NOT get it to work by adding the script inline in lc.tf
user_data = DATA <<
"
#cloud-boothook
#!/bin/bash
echo 'some crap'\'
"
DATA
I could fetch it from user data,
wget http://169.254.169.254/latest/user-data
but noticed I was getting it with the quotes still in it.
This is how I got it to work: I moved to pulling it from a template instead since what you see is what you get.
user_data = "${data.template_file.bootscript.rendered}"
This means I also need to declare my template file like so:
data "template_file" "bootscript" {
template = "${file("bootscript.tpl")}"
}
But I was still getting an error in the cloud init logs
/var/log/cloud-init.log
[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'Content-Type: text/cloud...'
Then I found this article about user data formatting user That makes sense, if user-data can come in multiple parts, maybe cloud-init needs cloud-init commands in one place and the script in the other.
So my bootscript.tpl looks like this:
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
#!/bin/bash
echo "some crap"
--//
#!/bin/bash
Do not leave any white space at the start of the line . Use exact command. Otherwise it may run in AMAZON Linux AMI , but it’ll not run in RHEL.
On ubuntu 16, removing /var/lib/cloud/* does not work. I removed only instance(s) from the folder /var/lib/cloud/ and then it ran fine for me
I ran:
sudo rm /var/lib/cloud/instance
sudo rm /var/lib/cloud/instances
Then I retried my user data script and it worked fine
I'm using CentOS and the logic for userdata there is simple:
In the file /etc/rc.local there is a call for a initial.sh script, but it looks for a flag first:
if [ -f /var/tmp/initial ]; then
/var/tmp/initial.sh &
fi
initial.sh is the file that does the execution of user-data, but in the end it deletes the flag. So, if you want your new AMI to execute user-data again, just create the flag again before create the image:
touch /var/tmp/initial
The only way I get it to work was to add the #cloud-boothook before the #!/bin/bash
This is a typical user data script that installs Apache web server on a newly created instance
#cloud-boothook
#!/bin/bash
yum update -y
yum install -y httpd.x86_64
systemctl start httpd.service
systemctl enable httpd.service
without the #cloud-boothook it does not work, but with it, it works. It seems that different users have different experiences. Some are able to get it to work without it, not sure why.
just add --// at the end of your user data script , example :
#!/bin/bash
#Upgrade ec2 instance
sudo yum update -y
#Start docker service
sudo service docker start
--//
User Data should execute fine without using #cloud-boothook (which is used to activate the User Data at the earliest possible time during the boot process).
I started a new Amazon Linux AMI and used your User Data, plus a bit extra:
#!/bin/bash
echo 'bar' > /tmp/bar
echo 'test' > /home/ec2-user/user-script-output.txt
echo 'foo' > /tmp/foo
This successfully created three files.
User Data scripts are executed as root, so it should have permission to create files in any location.
I notice that in your supplied code, one example refers to /home/ec2-user/user-script/output.txt (with a subdirectory) and one example refers to /home/ec2-user/user-script-output.txt (no subdirectory). The command would understandably fail if you attempt to create a file in a non-existent directory, but your "Update" example seems to show that it did actually work.
Also,
If we are using user interactive commands like
sudo yum install java-1.8.0-devel
Then, we need to use with flags like -y
sudo yum install java-1.8.0-devel -y
You can find this in the EC2 documentation under Run commands at launch.
Ref: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-shell-scripts

Resources