Why doesn't Compute VM COS metadata not get carry over to "Equivalent command line"? - google-container-os

I'm deploying a container to a Container Optimized OS or COS on Google Compute.
I want to specify Logging and Monitoring for the VM. There are 2 ways to do this:
Specify metadata flags:
Mark the checkboxes
But when I then click on "Equivalent command line", there's no indication of these options.
Am I just misinterpreting something here or am I not allowed to specify these flags in the command?
I tried with the non-COS VM instance and the expected metadata flag showed up to indicate the metadata. But this does not show up in the COS command.
gcloud compute instances create instance-1 \
...
--metadata=MY_TEST_FLAG=test_value

Yes. When using container optimized OS images while creating a VM this issue is coming, But this is for command line code only. REST equivalent is generated properly, As a work around for this you can add the metadata flag to the generated command as mentioned below.
--metadata=google-logging-enabled=true,google-monitoring-enabled=true
I have raised a request on this issue. Please monitor the Google Public Issue Tracker for further updates on the fix of the issue.
In case you find any such issues in future you can report to Google using Report issues and request features with issue trackers.

Related

AKS configured Container Insights does capture excluded namespaces

I have an AKS cluster running on which I enabled Container Insights.
The Log Analytics workspace has a decent amount of logs in there.
Now I do have my applications running on a separate namespace, and one namespace which has some Grafana containers running (which I also don't want in my captured logs).
So, I searched on how I could reduce the amount of captured logs and came across this Microsoft docs article.
I deployed the template ConfigMap to my cluster and for [log_collection_settings.stdout] and [log_collection_settings.stderr] I excluded the namespaces which I don't want to capture.
When calling kubectl edit configmap container-azm-ms-agentconfig -n kube-system I get the following:
Which means that my config is actually in there.
Now when I open a query window in Log Analytics workspace and execute the following query:
KubePodInventory
| where Namespace == "kube-system"
I get plenty of results with a TimeGenerated column that contains values that are like 5 minutes ago, while I setup the ConfigMap a week ago.
In the logs of one of the pods omsagent-... I see logs like the following:
Both stdout & stderr log collection are turned off for namespaces: '*.csv2,*_kube-system_*.log,*_grafana-namespace_*.log'
****************End Config Processing********************
****************Start Config Processing********************
config::configmap container-azm-ms-agentconfig for agent settings mounted, parsing values
config::Successfully parsed mounted config map
While looking here at StackOverflow, I found the following answers which make me believe that this is the right thing that I did:
https://stackoverflow.com/a/63838009
https://stackoverflow.com/a/63058387
https://stackoverflow.com/a/72288551
So, not sure what I am doing wrong here. Anyone an idea?
Since I hate it myself that some people don't post an answer even if they already have one, here it is (although not the answer you want, at least for now).
I posted the issue on GitHub where the repository is maintained for Container Insights.
The issue can be seen here on GitHub.
If you don't want to click the link, here is the answer from Microsoft:
We are working on adding support for namespace filtering for inventory and perf metrics tables and will update you as soon this feature available.
So, currently we are not able to exclude more data than the ContainerLog table with this ConfigMap.

azcopy from Google Cloud to Azure just hangs

I am trying to use azcopy to copy from Google Cloud to Azure.
I'm following instructions here and I can see in the logs generated that the connectivity to GCP seems fine, the SAS token is fine and it creates the container fine (see it appear in Azure Storage Explorer) but then it just hangs. Output is:
INFO: Scanning...
INFO: Authenticating to source using GoogleAppCredentials
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
If I look at the log it shows:
2022/06/01 07:43:25 AzcopyVersion 10.15.0
2022/06/01 07:43:25 OS-Environment windows
2022/06/01 07:43:25 OS-Architecture amd64
2022/06/01 07:43:25 Log times are in UTC. Local time is 1 Jun 2022 08:43:25
2022/06/01 07:43:25 ISO 8601 START TIME: to copy files that changed before or after this job started, use the parameter --include-before=2022-06-01T07:43:20Z or --include-after=2022-06-01T07:43:20Z
2022/06/01 07:43:25 Authenticating to source using GoogleAppCredentials
2022/06/01 07:43:26 Any empty folders will not be processed, because source and/or destination doesn't have full folder support
As I say, no errors around SAS token being out of date, or can't find the GCP credentials, or anything like that.
It just hangs.
It does this if I try and copy a single named file or a recursive directory copy. Anything.
Any ideas, please?
• I would suggest you to please check the logs of these AzCopy transactions for more details on this scenario. To collect the logs and analyze them, you will have to check the logs stored in ‘%USERPROFILE%\.azcopy’ directory on Windows. AzCopy creates log and plan files for every job, so you will have to investigate and troubleshoot any potential problems regarding this scenario by analyzing them.
• As you are encountering hang issues with the AzCopy utility during a job execution for transferring files, it might be a network fluctuation issue, timeout issue or server busy issues. Please do remember that AzCopy retries upto 20 times in these cases and usually the retry process succeeds. Try to look for the errors in the logs that are near ‘UPLOADFAILED, COPYFAILED, or DOWNLOADFAILED’.
• The following command will get all the errors with ‘UPLOADFAILED’ status from the concerned log file: -
Select-String UPLOADFAILED .\<CONCERNEDLOGFILE GUID>.log
To show the jobs by status relating to the job ID, kindly execute the below command: -
azcopy jobs show <job-id> --with-status=Failed
• Execute the AzCopy job execution command from your local system with ‘--no-check-certificate’ argument which will ensure that there are no certificate checks for the system certificates at the receiving end. Ensure that the root certificates for the network client device or software are correctly installed on your local system as they are the only ones to block your jobs while transferring files from on-premises to Azure.
Also, once the job starts initially without any parameters, then when it hangs, just press CTRL+C to kill the process and then immediately check the logs in AzCopy as well as in the event viewer for any system issues. It will help you know the exact issue regarding this. It really shows why the process failed and got hung.
For more information, kindly refer to the documentation link below: -
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure
https://github.com/Azure/azure-storage-azcopy/issues/517
Frustratingly, having many calls with Microsoft support, on demoing this to another person the exact same command with exact same SAS token etc that was previously failing just started to work.
I hate problems that 'fix themselves' as it means it will likely occur again.
Thanks to KartikBhiwapurkar-MT for a detailed response too.

About container options of Azure Batch

I am in trouble with the container options of Azure Batch.
To change the hostname of the container to be started, set --hostname="test" to the containerRunOptions of Task.
However, it is an error!
ContainerSettings: --hostname="test" Message: create_container() got an unexpected keyword argument 'hostname '
Even -h test will result in a similar error.
Other options work fine.(--volume etc...)
Pool Infomation:
Publisher:microsoft-azure-batch
OS:centos-container
sku:7-4
image:centos:latest(docker hub)
Is this a bug in Azure Batch?
Is the option to specify it wrong?
Updated answer (2018-08-23):
The fix for this issue has been rolled out.
Previous answer:
This was identified as a service defect and will be addressed in a future version. You can track the Azure Batch Node Agent release notes for when the fix is released.
If you are using Batch to execute tasks without performing deeper integration, e.g., you are using the Azure CLI or similar tooling, you can use Batch Shipyard in "non-native mode" to work around this problem in the meantime. (Disclaimer: I'm a contributor for this code).

Jenkins error trying to raise on-demand linux ec2 slave

Whenever I try to trigger a job that depends on that ec2 slave, it just stands in queue. I looked at the logs and saw this exception:
com.amazonaws.services.ec2.model.AmazonEC2Exception: Network interfaces and an instance-level security groups may not be specified on the same request
Whenever I click on build executor status on the left, there is a button that says "provision via ". I click on it and see the correct amazon linux image name that I entered under cloud on Jenkins' System Configuration, but when I click on that, I see that same exception as well... I just don't know how to fix this and cannot find any helpful information on this.
Any help would be much appreciated.
Ok, I'm not exactly sure what was causing the error since I don't really know how the Jenkins plugin interfaces with the aws api. But after a good amount of trial and error, I was able to provision the On Demand worker by adding more details/parameters in Configuration, under Cloud.
Adding a subnet ID for the VPC and a IAM Instance profile did the trick (I already had everything else including security groups, availability zone, instance type, etc). So it seems like you either leave out security groups, or go all in and fill in pretty much everything.
As an FYI if you see this with Jenkins EC2 Plugin v1.46 it looks like a genuine bug:
https://issues.jenkins-ci.org/browse/JENKINS-59543
The solution is to use 1.45 until it's fixed (see link above for more details).

Unable to copy RDS parameter group across regions

I am using the RDS command line tool from here and am having trouble copying the parameter group to a different region. Running the rds-copy-db-parameter-group fails with the following error:
rds-copy-db-parameter-group: Could not find the resource you requested: DB ParameterGroup not found, not allowed to do cross region copy.
The command I am using is:
rds-copy-db-parameter-group arn:aws:rds:ap-southeast-1:myAccntId:pg:myParamGroup-utf8mb4 -t copyOfMyParam -td testcopy
I'm pretty sure the ARN is correct and the parameter does exist. Is this a problem with the tool or aws? Is anyone else encountering a similar issue?
I ran into this same issue recently and opened a support ticket with AWS. The response I got was that the RDS team added this feature to the documentation but haven't yet built the actual support for this feature.
This bothered me a lot and ate up a couple of hours so I put this simple script together. There's loads of room for improvement so please share if you improve upon it or find issues!
https://gist.github.com/phill-tornroth/f0ef50f9402c7c94cbafd8c94bbec9c9

Resources