CloudWatch - CPUUtilization metric - autoscaling

I'm able to work with the metric GroupInServiceInstances quietly. got after enable euscale-enable-metrics-collection groupname -g 1minute
But I can not work with the metric of CPUUtilization of the AWS/EC2, the alarm does not attend the metric, the state is only INSUFFICIENT_DATA. I noticed that when running the command euscale-describe-metric-collection-types shows nothing AWS/EC2, but only the AWS/ScalingGroup, see:
METRIC-COLLECTION-TYPE GroupDesiredCapacity
METRIC-COLLECTION-TYPE GroupInServiceInstances
METRIC-COLLECTION-TYPE GroupMaxSize
METRIC-COLLECTION-TYPE GroupMinSize
METRIC-COLLECTION-TYPE GroupPendingInstances
METRIC-COLLECTION-TYPE GroupTerminatingInstances
METRIC-COLLECTION-TYPE GroupTotalInstances
GRANULARITY-METRIC-TYPE 1minute
should be displayed:
METRIC-COLLECTION-TYPE CPUUtilization
GRANULARITY-METRIC-TYPE Percent
So, what I do to the alarm of the metric CPUUtilization AWS/EC2 work?

The euca2ools euscale-* commands are for use with the Auto Scaling service, the euscale-describe-metric-collection-types command is the DescribeMetricCollectionTypes action, and only returns metrics for Auto Scaling.
To enable EC2 metrics in Eucalyptus you have to enable metrics collection for the instance (euca-monitor-instances) and you should also ensure that when working with the CPUUtilization CloudWatch metric you specify the unit type of Percent.

Can you please try EucaLobo https://github.com/viglesiasce/EucaLobo?
I remember I had trouble with command line arguments at first.

I found the reason. The alarm was creating from graphic interface when creating a command worked. I found that the GUI does not assign the unit (Percent) in alarm, which causes it to not collect the metrics percentage.

Related

How to exclude a health check resource from Datadog metric alert monitor query?

We are setting up a metric alert monitor and other monitors using Terraforms. The query looks like this:
query = "max(last_10m):p95:trace.netty.request{env:${var.env},service:${local.service_name}} >= 4"
We would like to exclude health checks from this particular metric only, e.g. GET /healthcheck
How can this be achieved? Are there some examples?
Resources like the health check have a resource_name tag. This tag can be used to exclude them, e.g. !resource_name:get_/health
Here is an example of a query excluding the health check resources:
query = "max(last_10m):p95:trace.netty.request{env:${var.env},service:${local.service_name},!resource_name:get_/health} >= 4"
Visit DataDog documentation for more information.

How to avoid "Objects have changed outside of Terraform"?

Recently upgraded my Terraform project to AWS provider 3.74.0 and TF 1.1.4 (from much older versions).
I'm suddenly getting this autoscaling schedule reporting external changes:
resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
scheduled_action_name = "api-svc-tst-down-schedule"
min_size = 0
max_size = 1
desired_capacity = 0
// Minute Hour DayOfMonth Month DayOfWeek
recurrence = "0 13 * * *"
autoscaling_group_name = aws_autoscaling_group.api-svc-tst-asg.name
lifecycle {
ignore_changes = [start_time]
}
}
The plan command is now reporting:
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the
last "terraform apply":
# aws_autoscaling_schedule.api-svc-tst-down-schedule has changed
~ resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
id = "api-svc-tst-down-schedule"
~ start_time = "2022-01-31T13:00:00Z" -> "2022-02-01T13:00:00Z"
# (7 unchanged attributes hidden)
}
If I apply the plan, it doesn't appear that TF changes the ASG (I'm assuming it just updates its state file) and the notification goes away until the next day.
I note that the AWS console does show that the Scheduled action has a Start time, which seems to be being set by AWS.
I tried adding start_time to ignored_changes but it didn't seem to make a difference, still reported as externally changed.
Is this a known issue with Terraform (I'm not seeing anything via googling)?
How can I prevent TF from being marked as externally changed?
Edit: I also tried setting the start_time attribute as suggested in the comments. But the detected changes warning came back the next day.
Edit 2: I also tried deleting and re-adding the resource via Terraform, but it still gets marked as changed the next day.
This undesirable behavior was an intentional change introduced in Terraform version 0.15.4.
It cannot currently be avoided. The only workaround is that all team members (and tooling) must be educated to ignore "expected drift".
Note that this "expected drift" behavior is not limited to just aws_autoscaling_schedule resources, or even just the AWS provider. The issue happens on many different platforms/types for any resource where the cloud vendor updates the attribute after the resource is created.
Many resources will report drift immediately after being created - often you can get rid of the report by immediately doing an apply or refresh to update the TF state and as long as AWS doesn't make changes to those attributes, you won't see the resource reported as changed again.
Other resource attributes (like aws_autoscaling_schedule.start_time) get updated by the cloud vendor regularly. These types of resources will intermittently report "Objects have changed outside of Terraform", whenever you run plan.
There is a locked open issue to track: https://github.com/hashicorp/terraform/issues/28803.
Note that the issue is locked because Hashicorp got tired of people telling them how negatively this affects their teams.

'Delay until' finish time of 'Queue a new build' not working in Azure Logic App

I'm triggering an Azure Logic App from an https webhook for a docker image in Azure Container Registry.
The workflow is roughly:
When a HTTP request is received
Queue a new build
Delay until
FinishTime of Queue a new build
See: Workflow image
The Delay until action doesn't work in that the queueried FinishTime is 0001-01-01T00:00:00.
It complains about the wrong format, so I manually added a Z after the FinishTime keyword.
Now the time stamp is in the right format, however, the timestamp 0001-01-01T00:00:00Z obviously doesn't make sense and subsequent steps are executed without delay.
Anything that I am missing?
edit: Queue a new build queues an Azure pipeline build. I.e. the FinishTime property comes from the pipeline.
You need to set a timestamp in future, the timestamp 0001-01-01T00:00:00Z you set to the "Delay until" action is not a future time. If you set a timestamp as 2020-04-02T07:30:00Z, the "Delay until" action will take effect.
Update:
I don't think the "Delay until" can do what you expect, but maybe you can refer to the operations below. Just add a "Condition" action to judge if the FinishTime is greater than current time.
The expression in the "Condition" is:
sub(ticks(variables('FinishTime')), ticks(utcNow()))
In a word, if the FinishTime is greater than current time --> do the "Delay until" aciton. If the FinishTime is less than current time --> do anything else which you want.(By the way you need to pay attention to the time zone of your timestamp, maybe you need to convert all of the time zone to UTC)
I've been in touch with an Azure support engineer, who has confirmed that the Delay until action should work as I intended to use it, however, that the FinishTime property will not hold a value that I can use.
In the meantime, I have found a workaround, where I'm using some logic and quite a few additional steps. Inconvenient but at least it does what I want.
Here are the most important steps that are executed after the workflow gets triggered from a webhook (docker base image update in Azure Container Registry).
Essentially, I'm initializing the following variables and queing a new build:
buildStatusCompleted: String value containing the target value completed
jarsBuildStatus: String value containing the initial value notStarted
jarsBuildResult: String value containing the default value failed
Then, I'm using an Until action to monitor when the jarsBuildStatus's value is switching to completed.
In the Until action, I'm repeating the following steps until jarsBuildStatus changes its value to buildStatusCompleted:
Delay for 15 seconds
HTTP request to Azure DevOps build, authenticating with personal access token
Parse JSON body of previous raw HTTP output for status and result keywords
Set jarsBuildStatus = status
After breaking out of the Until action (loop), the jarsBuildResult is set to the parsed result.
All these steps are part of a larger build orchestration workflow, where I'm repeating the given steps multiple times for several different Azure DevOps build pipelines.
The final action in the workflow is sending all the status, result and other relevant data as a build summary to Azure DevOps.
To me, this is only a workaround and I'll leave this question open to see if others have suggestions as well or in case the Azure support engineers can give more insight into the Delay until action.
Here's an image of the final workflow (at least, the part where I implemented the Delay until action):
edit: Turns out, I can simplify the workflow because there's a dedicated Azure DevOps action in the Logic App called Send an HTTP request to Azure DevOps, which omits the need for manual authentication (Azure support engineer pointed this out).
The workflow now looks like this:
That is, I can query the build status directly and set the jarsBuildStatus as
#{body('Send_an_HTTP_request_to_Azure_DevOps:_jar''s')['status']}
The code snippet above is automagically converted to a value for the Set variable action. Thus, no need to use an additional Parse JSON action.

OpenFaas Autoscaling from 0

I am trying out OpenFaas auto scaling feature from 0 instances.
I tried running nodeinfo function with this and tried invoking it.
kubectl scale deployment --replicas=0 nodeinfo -n openfaas-fn
Once replicas are down to 0, I wanted to try invoking it from Gateway UI to make it auto scale from 0, but status is not ready and Invoke button is inactive until I bump up replica count to more than 0. It seems like it is not possible to invoke a function that has 0 instances.
It doesn't look like auto scaling from 0 working, unless I am missing something.
Any guidance or help is appreciated.
You can make the first function invocation with the command line.
Example:
echo -n "google.com" | faas-cli invoke curl --gateway 127.0.0.1:31112
Here, curl is the function's name

Errbit keeps spamming emails

im using errbit 0-3 stable and its working really good .
but the problem is sometimes it start spamming me emails for the same error but different hashes like the following :
Mongo::Error::NoServerAvailable: No server is available matching preference: #<Mongo::ServerSelector::Primary:0x007fdba42891f0 #tag_sets=[], #options={:database=>"db_test", :max_pool_size=>200, :wait_queue_timeout=>5, :write=>{"w"=>0}}, #server_selection_timeout=30>
Mongo::Error::NoServerAvailable: No server is available matching preference: #<Mongo::ServerSelector::Primary:0x007fdbb8148e30 #tag_sets=[], #options={:database=>"db_test", :max_pool_size=>200, :wait_queue_timeout=>5, :write=>{"w"=>0}}, #server_selection_timeout=30>
How can i filter them so it would group them into 1 error only ?
There's two ways to deal with this.
Option 1) Catch the errors in your application and scrub the uniqueness out of the error messages before sending them to Errbit.
Option 2) Errbit supports configurable "fingerprinting" so you can actually tell Errbit what attributes contribute to the uniqueness of error notifications. This can be done system-wide or on individual Errbit apps. In your case, you could toggle off the error message as part of the Error fingerprint.
From the Errbit README:
The way Errbit arranges notices into error groups is configurable. By
default, Errbit uses the notice's error class, error message, complete
backtrace, component (or controller), action and environment name to
generate a unique fingerprint for every notice. Notices with identical
fingerprints appear in the UI as different occurences of the same
error and notices with differing fingerprints are displayed as
separate errors.
Changing the fingerprinter (under the 'config' menu) applies to all
apps and the change affects only notices that arrive after the change.
If you want to refingerprint old notices, you can run rake
errbit:notice_refingerprint.

Resources