TeamCity Build Agent name is not picked up from config file - azure

I have recently setup a VM on azure to use as my build agent.
When the agent is started its name is calculated based on the azure instance name (_myservername) and the name I provide in the buildAgent.properties file is ignored completely.
This is particularly problematic when I have a second agent and the same name is chosen which will result in name conflict.
looking at the teamcity-agent.log I can see the following lines:
[2016-07-14 15:33:04,745] WARN - ds.azure.AzurePropertiesReader - Unable to set self port. Azure integration will experience problems
[2016-07-14 15:33:04,745] INFO - ds.azure.AzurePropertiesReader - Added alternative address is set to
[2016-07-14 15:33:04,745] INFO - ds.azure.AzurePropertiesReader - Instance name and agent name are set to _myservername
...
Question is:
Why is the name I provide via config file is not taking precedence over any other place it reads the name from? -- should it?
How can I possibly force a name on it?

OK, the came to find the answer to this and would share it here in case it would be useful for the humans of the future!
The issue was caused by the the azure-plugin where it was setting a configuration-parameter on the agent called instance name.
https://github.com/JetBrains/teamcity-azure-plugin/issues/17
The issue is fixed in the latest version of the plugin so upgrading it solved my problem. :)

Related

Airflow can't reach logs from webserver due to 403 error

I use Apache Airflow for daily ETL jobs. I installed it in Azure Kubernetes Service using the provided Helm chart. It's been running fine for half a year, but since recently I'm unable to access the logs in the webserver (this used to always work fine).
I'm getting the following error:
*** Log file does not exist: /opt/airflow/logs/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** Fetching from: http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** !!!! Please make sure that all your Airflow components (e.g. schedulers, webservers and workers) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd) !!!!!
****** See more at https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#secret-key
****** Failed to fetch log file from worker. Client error '403 FORBIDDEN' for url 'http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log'
For more information check: https://httpstatuses.com/403
What have I tried:
I've made sure that the log file exists (I can exec into the airflow-worker-0 pod and read the file on command line in the location specified in the error).
I've rolled back my deployment to an earlier commit from when I know for sure it was still working, but it made no difference.
I was using webserverSecretKeySecretName in the values.yaml configuration. I changed the secret to which that name was pointing (deleted it and created a new one, as described here: https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key) but it didn't work (no difference, same error).
I changed the config to use a webserverSecretKey instead (in plain text), no difference.
My thoughts/observations:
The error states that the log file doesn't exist, but that's not true. It probably just can't access it.
The time is the same in all pods (I double checked be exec-ing into them and typing date in the command line)
The webserver secret is the same in the worker, the scheduler, and the webserver (I double checked by exec-ing into them and finding the corresponding env variable)
Any ideas?
Turns out this was a known bug with the latest release (2.4.0) of the official Airflow Helm chart, reported here:
https://github.com/apache/airflow/discussions/26490
Should be resolved in version 2.4.1 which should be available in the next couple of days.

InvalidResourceName error on learn/modules/connect-iot-edge-device-to-iot-central

I'm just running through this learning module and seem to be stuck on unit 5/8 (Exercise - Deploy an IoT Edge device and manage it from IoT Central)
https://learn.microsoft.com/en-us/learn/modules/connect-iot-edge-device-to-iot-central/
See screenshot below for the error I'm getting. I've followed the steps up to this point which have been fairly straight forward. Wondering if this is a bug? Can anyone shed any light on this? Thanks!
Most likely, the value of $APP_NAME is empty. That would result in a value of ip- for the dnsLabelPrefix parameter in the script. To verify this, try reading the value echo $APP_NAME.
This variable was set in module 3 Exercise - Create an IoT Central application. It's likely that if you took a break and started a new sandbox, this value is no longer set. Try setting the value explicitly before running the step resulting in the error. Simply set the value to your IoT Central app name.
APP_NAME="store-manager-$RANDOM"
echo "Your application name is: $APP_NAME"

Invalid resource name creating a connection to azure storage

I'm trying to create a project of the labeling tool from the Azure form recognizer. I have successfully deployed the web app, but I'm unable to start a project. I get this error all every time I try:
I have tried with several app instances and changing the project name and connection name, none of those work. The only common factor and finding here as that it is related to the connection.
As I see it:
1) I can either start a new project or use one on the cloud:
First I tried to create a new project:
I filled the fields with these values:
Display Name: Test-form
Source Connection: <previuosly created connection>
Folder Path: None
Form Recognizer Service Uri: https://XXX-test.cognitiveservices.azure.com/
API Key: XXXXX
Description: None
And got the error from the question's title:
"Invalid resource name creating a connection to azure storage "
I tried several combinations of names, none of those work.
Then I tried with the option: "Open a cloud project"
Got the same error instantly, hence I deduce the issue is with the connection settings.
Now, In the connection settings I have this:
At first glance, since the values are accepted and the connection is created. I guess it is correct but it is the only point I failure I can think of.
Regarding the storage container settings, I added the required CORS configuration and I have used it to train models with the Forms Recognizer, So that part does works.
At this point at pretty much stuck, since I error message does not give me many clues on where is the error.
I was facing a similar error today
You have to add container name before "?sv..." in your SAS URI in connection settings
https://****.blob.core.windows.net/**trainingdata**?sv=2019-10-10..

stackdriver logging agent not showing logs read from a custom log file in stackdriver logging viewer on Google cloud platform

I decided to post this question because, I have ran out of debugging ideas, just ideas are golden since I know it can be difficult to help debugging a virtual instance through here (debugging code is hard enough jaja). Anyway, I have created a virtual machine in Compute engine , I created a logs file that I populate, for example, with this command in a python script, let's call it logging.py:
import logging
logging.basicConfig(filename= 'app.log' , level = logging.INFO , format = ' %(asctime)s - %(name) - %(levelname)s - %(message)s')
logging.info('Some message ' + str(type(variable)))
everytime I use python3 logging.py , the app.log is effectively populated. ( Logging.py and app.log are in the same directory the /home/username/ folder )
I want stackdriver to show this log in the logging viewer everytime it's written, so , I installed the stackdriver agent as follows, in the virtual machine command line:
$ curl -sSO https://dl.google.com/cloudagents/install-logging-agent.sh
$ sudo bash install-logging-agent.sh
No errors that I see are delivered here, in fact, you can see here the messages obtained
Messags on the stackdriver viewer:
After this, I proceed to create a .conf file that I create in /etc/google-fluentd/config.d/app.conf
with this parameters
<source>
type tail
format none
path /home/username/app.log
pos_file /var/lib/google-fluentd/pos/app.pos
read_from_head true
tag whatever-tag
</source>
After that is created, I launch sudo service google-fluentd restart.
Aftert I execute, python3 logging.py , no logs are added to stack drivers logging viewer.
So, where might Have I gone wrong?
Things I have tried/checked:
-Have more than 13 gygabytes of RAM available
-If I run logger "some message" on the command line, I effectively add a log with "some message" to the log viewer
-If I run
ps ax | grep fluentd
I obtain :
3033 ? Sl 0:09 /opt/google-fluentd/embedded/bin/ruby /usr/sbin/google-fluentd --log /var/log/google-fluentd/google-fluentd.log --no-supervisor
3309 pts/0 S+ 0:00 grep --color=auto fluentd
-Both my user, and the service account I use, have logger admin permission in IAM roles.
-This is the documentation I have based myself on:
https://cloud.google.com/logging/docs/agent/troubleshooting?hl=es-419
https://cloud.google.com/logging/docs/reference/v2/rest/v2/entries/list?hl=es-419
https://cloud.google.com/logging/docs/agent/configuration?hl=es-419
https://medium.com/google-cloud/how-to-log-your-application-on-google-compute-engine-6600d81e70e3
https://cloud.google.com/logging/docs/agent/installation
-If I run sudo service google-fluentd status , the agent appears active.
-My instance hass access, to all the apis. It's an n1-standard-4 (4 vCPUs, 15 GB of memory) using ubuntu linux 18:04
So, what else can I check to debug this? I'm out of ideas here , hope I'm not being an idiot here :(
Based on my understanding, I think that you looking for the following fluentd resource types:
generic_node
“A generic node identifies a machine or other computational resource for which no more specific resource type is applicable. The label values must uniquely identify the node.”
generic_task
“A generic task identifies an application process for which no more specific resource is applicable, such as a process scheduled by a custom orchestration system. The label values must uniquely identify the task.”
The source of my information has been found here
This document explain how to send logs from your application in different ways:
Cloud Logging API
Cloud Logging Agent
Generic fluentd
As you mentioned having installed fluentd, let me provide more focused documentation about Cloud Logging Agent. I also found some python Client Library documentation that you may be interested.
Finally, I found a nginx/apache use-case guide that you may use as reference.
For some reason, if I change the directory to which both the .conf file points, and the directory where the logg is to /var/logs/ , being the final path as /var/logs/app.logs, it does work correctly. Possibly there is a configuration issue, causing the logging agent to only capture logs in specific predetermined folders, or a permissions issue that stops it from working if the log is in the username directory.
I found this solution, however, by chance(random testing basically.
). Did not find anything in the main articles that are supposed to teach me how to configure the logging agent, that could point me in the right direction, being those articles this ones,
https://cloud.google.com/logging/docs/agent/troubleshooting?hl=es-419 https://cloud.google.com/logging/docs/reference/v2/rest/v2/entries/list?hl=es-419 https://cloud.google.com/logging/docs/agent/configuration?hl=es-419 https://medium.com/google-cloud/how-to-log-your-application-on-google-compute-engine-6600d81e70e3 https://cloud.google.com/logging/docs/agent/installation
If I needed it to work in my username directory, it's not clear just by checking this articles how to do it,what configuration file I would need to change or where to start, so I recommend to google to improve that aspect of the docs.
This documentation you have sent https://docs.fluentd.org/quickstart is pretty interesting, maybe I can find the explanation there, thank you for your help.

Azure: what could be the cause of the error "Unable to edit or replace deployment"?

When I recreate my VM I got the following error:
Problem occurred during request to Azure services. Cloud provider details: Unable to edit or replace deployment 'VM-Name': previous deployment from '8/20/2019 6:20:33 AM' is still active (expiration time is '8/27/2019 5:17:41 AM'). Please see https://aka.ms/arm-deploy for usage details.
Help me please to understand.
What could be the cause of the error ?
UPDATED:
This deployment has not been started previously.
Prior to this, errors were received during creation:
Azure is not available now. Please Try again later
There were several such errors one at a time and then I got that error related to:
Unable to edit or replace deployment
My assumptions about this.
Tell me, am I right or not ?
I launched the image, then after some time I recreated it.
Creation began, but at that moment the connection with Azure was lost.
Then, when the connection was restored, we tried to make a deployment that was not removed in the previous attempt (because there was no connection with Azure).
As a result, we got such an error.
Does this theory make sense?
exactly what it says, there is another deployment with the same name going on at this time, either change the name of the deployment you are trying to queue or wait for the other deployment to finish\fail
This can also occur if you use Bicep templates for your ARM deployement and multiple modules or resources in the template have the same name:
module fooModule '../modules/foo.bicep' = {
name: 'foo'
}
module barModule '../modules/bar.bicep' = {
name: 'foo'
}
I got the same error initially pipeline was working but when retriggered pipeline took more time so i canceled the deployment and made a fresh rerun it encounters. i think i need wait until that deployment filed.

Resources