I am in trouble with the container options of Azure Batch.
To change the hostname of the container to be started, set --hostname="test" to the containerRunOptions of Task.
However, it is an error!
ContainerSettings: --hostname="test" Message: create_container() got an unexpected keyword argument 'hostname '
Even -h test will result in a similar error.
Other options work fine.(--volume etc...)
Pool Infomation:
Publisher:microsoft-azure-batch
OS:centos-container
sku:7-4
image:centos:latest(docker hub)
Is this a bug in Azure Batch?
Is the option to specify it wrong?
Updated answer (2018-08-23):
The fix for this issue has been rolled out.
Previous answer:
This was identified as a service defect and will be addressed in a future version. You can track the Azure Batch Node Agent release notes for when the fix is released.
If you are using Batch to execute tasks without performing deeper integration, e.g., you are using the Azure CLI or similar tooling, you can use Batch Shipyard in "non-native mode" to work around this problem in the meantime. (Disclaimer: I'm a contributor for this code).
Related
I'm deploying a container to a Container Optimized OS or COS on Google Compute.
I want to specify Logging and Monitoring for the VM. There are 2 ways to do this:
Specify metadata flags:
Mark the checkboxes
But when I then click on "Equivalent command line", there's no indication of these options.
Am I just misinterpreting something here or am I not allowed to specify these flags in the command?
I tried with the non-COS VM instance and the expected metadata flag showed up to indicate the metadata. But this does not show up in the COS command.
gcloud compute instances create instance-1 \
...
--metadata=MY_TEST_FLAG=test_value
Yes. When using container optimized OS images while creating a VM this issue is coming, But this is for command line code only. REST equivalent is generated properly, As a work around for this you can add the metadata flag to the generated command as mentioned below.
--metadata=google-logging-enabled=true,google-monitoring-enabled=true
I have raised a request on this issue. Please monitor the Google Public Issue Tracker for further updates on the fix of the issue.
In case you find any such issues in future you can report to Google using Report issues and request features with issue trackers.
I am trying to use azcopy to copy from Google Cloud to Azure.
I'm following instructions here and I can see in the logs generated that the connectivity to GCP seems fine, the SAS token is fine and it creates the container fine (see it appear in Azure Storage Explorer) but then it just hangs. Output is:
INFO: Scanning...
INFO: Authenticating to source using GoogleAppCredentials
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
If I look at the log it shows:
2022/06/01 07:43:25 AzcopyVersion 10.15.0
2022/06/01 07:43:25 OS-Environment windows
2022/06/01 07:43:25 OS-Architecture amd64
2022/06/01 07:43:25 Log times are in UTC. Local time is 1 Jun 2022 08:43:25
2022/06/01 07:43:25 ISO 8601 START TIME: to copy files that changed before or after this job started, use the parameter --include-before=2022-06-01T07:43:20Z or --include-after=2022-06-01T07:43:20Z
2022/06/01 07:43:25 Authenticating to source using GoogleAppCredentials
2022/06/01 07:43:26 Any empty folders will not be processed, because source and/or destination doesn't have full folder support
As I say, no errors around SAS token being out of date, or can't find the GCP credentials, or anything like that.
It just hangs.
It does this if I try and copy a single named file or a recursive directory copy. Anything.
Any ideas, please?
• I would suggest you to please check the logs of these AzCopy transactions for more details on this scenario. To collect the logs and analyze them, you will have to check the logs stored in ‘%USERPROFILE%\.azcopy’ directory on Windows. AzCopy creates log and plan files for every job, so you will have to investigate and troubleshoot any potential problems regarding this scenario by analyzing them.
• As you are encountering hang issues with the AzCopy utility during a job execution for transferring files, it might be a network fluctuation issue, timeout issue or server busy issues. Please do remember that AzCopy retries upto 20 times in these cases and usually the retry process succeeds. Try to look for the errors in the logs that are near ‘UPLOADFAILED, COPYFAILED, or DOWNLOADFAILED’.
• The following command will get all the errors with ‘UPLOADFAILED’ status from the concerned log file: -
Select-String UPLOADFAILED .\<CONCERNEDLOGFILE GUID>.log
To show the jobs by status relating to the job ID, kindly execute the below command: -
azcopy jobs show <job-id> --with-status=Failed
• Execute the AzCopy job execution command from your local system with ‘--no-check-certificate’ argument which will ensure that there are no certificate checks for the system certificates at the receiving end. Ensure that the root certificates for the network client device or software are correctly installed on your local system as they are the only ones to block your jobs while transferring files from on-premises to Azure.
Also, once the job starts initially without any parameters, then when it hangs, just press CTRL+C to kill the process and then immediately check the logs in AzCopy as well as in the event viewer for any system issues. It will help you know the exact issue regarding this. It really shows why the process failed and got hung.
For more information, kindly refer to the documentation link below: -
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure
https://github.com/Azure/azure-storage-azcopy/issues/517
Frustratingly, having many calls with Microsoft support, on demoing this to another person the exact same command with exact same SAS token etc that was previously failing just started to work.
I hate problems that 'fix themselves' as it means it will likely occur again.
Thanks to KartikBhiwapurkar-MT for a detailed response too.
Whenever I try to trigger a job that depends on that ec2 slave, it just stands in queue. I looked at the logs and saw this exception:
com.amazonaws.services.ec2.model.AmazonEC2Exception: Network interfaces and an instance-level security groups may not be specified on the same request
Whenever I click on build executor status on the left, there is a button that says "provision via ". I click on it and see the correct amazon linux image name that I entered under cloud on Jenkins' System Configuration, but when I click on that, I see that same exception as well... I just don't know how to fix this and cannot find any helpful information on this.
Any help would be much appreciated.
Ok, I'm not exactly sure what was causing the error since I don't really know how the Jenkins plugin interfaces with the aws api. But after a good amount of trial and error, I was able to provision the On Demand worker by adding more details/parameters in Configuration, under Cloud.
Adding a subnet ID for the VPC and a IAM Instance profile did the trick (I already had everything else including security groups, availability zone, instance type, etc). So it seems like you either leave out security groups, or go all in and fill in pretty much everything.
As an FYI if you see this with Jenkins EC2 Plugin v1.46 it looks like a genuine bug:
https://issues.jenkins-ci.org/browse/JENKINS-59543
The solution is to use 1.45 until it's fixed (see link above for more details).
I have an Azure Cloud Service Load Balanced over 2 Roles with a recent addition of a custom LoadBalancerProbe.
The issue I am having is that in my site I have a threaded task that was throwing unhandled exceptions, 5 of these unhandled exceptions was causing the role to report itself as unhealthy. I have fixed this issue by dealing with these unhandled exceptions, however, we have a large team and I am fearful that this may happen again as a result of a future release.
I would like to set up an azure notification (preferably email) to let me know when one of my roles reports itself unhealthy. I have looked around on the web but cannot find any help. Has someone done this before? It's more of a precaution but I am keen to get this implemented.
Thanks in advance for any help you can give me
There isn't an alert baked into the platform for this that I know of, but you can certainly do some coding/scripting to get this done. You can use any of the SDKs, Microsoft Azure Management Libraries or command line tools that let you get at the Azure Management API, but for my example here I'll use PowerShell. I don't have a script to do this completely, but the main part is getting the instance statuses, such as:
$NonReadyInstances = (Get-AzureDeployment $serviceName -Slot Production -verbose:$false).RoleInstanceList | Where-Object { $_.InstanceStatus -ne "ReadyRole" }
This is getting all the instances that aren't in a ready state; however, you'd probably want to filter out for several of the other valid states of instances as your system scales, etc. There is a list of valid instances states you can use to decide which ones you want to be notified about. For example, in your case I'd think CyclingRole or RestartingRole would be ones to watch for.
Check the count of how many items are returned in the $NonReadyInstances variable and if it is greater than zero send your email. Put that in a script and run it regularly.
Here's a link to getting started and set up with the Azure PowerShell Cmdlets if you aren't familiar with it. - http://msdn.microsoft.com/en-us/library/jj156055.aspx
Also, here is a link on the Azure Automation preview service if you want to run this without running it from your own machines- http://azure.microsoft.com/en-us/services/automation/.
Whenever we get the error "Role Instances are taking longer than expected". The only possible options to do are .
Shutdown the emulators and try again.
Restart the machine and see if that helps.
Uninstall the Azure Tools for that version.
Some times uninstalling the same takes a long time,some times even days. It appears that some process or service is blocking the same. Has anyone faced this before ? If yes does anyone know which process would be blocking the same?
When an instance starts it will run the OnStart method on the worker/web role (depending on your service type). The more stuff you have in there, the more time it will take to start up the role. Common caveats are the Cache as mentioned and blob/table storage (if you do read/write/create when you start the role).
Try minimizing the OnStart's workload and moving any storage stuff in async tasks.
I have had similar problems as well in the past
IISConfigurator could not map the web roles in IIS. In my case it was due to corrupted file system ACLs on the code directory. See logs under C:\Users\YOUR_USER_NAME\AppData\Local\dftmp\IISConfiguratorLogs\
Another cause might be that something else has tied up the Port Numbers that Azure is trying to bind your web role on. Or that the ports that the local storage needs for tables/blobs and queues (10000-10002) have been taken by another app. Open a command prompt and run netstat -anb
Try running the Visual Studio using "Run as Administrator" option.