best practice to create cloudwatch alarm to monitor amazon rds - amazon-rds

I know amazon provides awesome metrics for monitoring rds box, but my question is, if I only want to monitor whether it's reachable or not, like zabbix ping, what metric shall i use when creating an alarm?

On the RDS console you can create event subscriptions, select events (like availability and failure) and assign notification groups.
I didn't find an option on cloudwatch to do this.

Related

What is the best way to monitor services/applications running on Azure Virtual machines

i.e. I'm looking for a way to constantly monitor my virtual machines on Azure and check if specific services (database, application) in the vm are up and running. I also want to be alerted (email) when those services are down. Since the services running on the vm are not publicly available, I am looking for a solution that will monitor their status and report from inside the vms (if that is possible).
I am open to any suggestions.
Thanks in advance
You can integrate with Azure Monitor to collect various metrics related to the application as well as database and use alerts to send notifications.

Monitoring & Detecting Exceptions in Applications using Cloud Monitoring

I am new to GCP and come from an Azure background. Is there an equivalent of "Azure Application Insights" on the GCP side for Monitoring Applications?
Let me explain my use case more clearly with an example: If I have a .NET based web application running on a Windows VM on GCP can Google Cloud Monitoring help detect Exceptions raised by the running application and send out alerts.
Any pointers/links to further explore this type of monitoring capability would be helpful.
Cloud Monitoring will provide you with many statisctics - most probably with what you need. And if there aren't any metrics to suit you need you may create ones based on the logs collected from the VM.
By default there is a number of logs being ingested but if you want to have full range and experiment with various ones you may want to install a monitoring agent. Go through the documentation and have a look.
You can then use the metrics to create charts and have a live view on a number of things such as cpu utilisation, disk IO/s, dropped/sent/received packets etc. Here's the Cloud Monitoring documentation.
And finally - you can create alerts based on the metrics (set thresholds, time periods etc). They can be simple e-mail alerts for example but they can be sent via pub-sub and trigger some functions or apps too.
Since you're new to GCP it's a lot of reading ahead of you but you will easily find documentation for most of GCP's services.
If you provide more details I can update my answer and give you more precise answer.

Custom Cloudwatch Metrics

I am using AWS RDS SQL server and I need to do enhanced level monitoring via Cloudwatch. By default there are some basic monitoring available but I want use custom metrics as well.
In my scenario I need to create an alarm whenever we get more number of deadlock in SQL server. We are able to fetch the details of deadlock via script and I need to prepare custom metrics for the same.
Can any one help on this or kindly suggest any alternate solution?

How is the Multi-AZ deployment of Amazon RDS realized?

Recently I'm considering to use Amazon RDS Multi-AZ deployment for a service in production environment, and I've read the related documents.
However, I have a question about the failover. In the FAQ of Amazon RDS, failover is described as follows:
Q: What happens during Multi-AZ failover and how long does it take?
Failover is automatically handled by Amazon RDS so that you can resume
database operations as quickly as possible without administrative
intervention. When failing over, Amazon RDS simply flips the canonical
name record (CNAME) for your DB Instance to point at the standby,
which is in turn promoted to become the new primary. We encourage you
to follow best practices and implement database connection retry at
the application layer. Failover times are a function of the time it
takes crash recovery to complete. Start-to-finish, failover typically
completes within three minutes.
From the above description, I guess there must be a monitoring service which could detect failure of primary instance and do the flipping.
My question is, which AZ does this monitoring service host in? There are 3 possibilities:
1. Same AZ as the primary
2. Same AZ as the standby
3. Another AZ
Apparently 1&2 won't be the case, since it could not handle the situation that entire AZ being unavailable. So, if 3 is the case, what if the AZ of the monitoring service goes down? Is there another service to monitor this monitoring service? It seems to be an endless domino.
So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?
So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?
I think that the "how" in this case is abstracted by design away from the user, given that RDS is a PaaS service. A multi-AZ deployment has a great deal that is hidden, however, the following are true:
You don't have any access to the secondary instance, unless a failover occurs
You are guaranteed that a secondary instance is located in a separate AZ from the primary
In his blog post, John Gemignani mentions the notion of an observer managing which RDS instance is active in the multi-AZ architecture. But to your point, what is the observer? And where is it observing from?
Here's my guess, based upon my experience with AWS:
The observer in an RDS multi-AZ deployment is a highly available service that is deployed throughout every AZ in every region that RDS multi-AZ is available, and makes use of existing AWS platform services to monitor the health and state of all of the infrastructure that may affect an RDS instance. Some of the services that make up the observer may be part of the AWS platform itself, and otherwise hidden from the user.
I would be willing to bet that the same underlying services that comprise CloudWatch Events is used in some capacity for the RDS multi-AZ observer. From Jeff Barr's blog post announcing CloudWatch Events, he describes the service this way:
You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules, it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.
Think of the observer the same way - it's a component of the AWS platform that provides a function that we, as the users of the platform do not need to think about. It's part of AWS's responsibility in the Shared Responsibility Model.
Educated guess - the monitoring service runs on all the AZs and refers to a shared list of running instances (which is sync-replicated across the AZs). As soon as a monitoring service on one AZ notices that another AZ is down, it flips the CNAMES of all the running instances to an AZ which is currently up.
We did not get to determine where the fail-over instance resides, but our primary is in US-West-2c and secondary is in US-West-2b.
Using PostgreSQL, our data became corrupted because of a physical problem with the Amazon volume (as near as we could tell). We did not have a multi-AZ set up at the time, so to recover, we had to perform a point-in-time restore as close in time as we could to the event. Amazon support assured us that had we gone ahead with the Multi-AZ, they would have automatically rolled over to the other AZ. This begs the questions how they could have determined that, and would the data corruption propagated to the other AZ?
Because of that shisaster, we also added a read-only replica, which seems to make a lot more sense to me. We also use the RO replica for read and other functions. My understanding from my Amazon rep is that one can think of the Multi-AZ setting as more like a RAID situation.
From the docs, fail over occurs if the following conditions are met:
Loss of availability in primary Availability Zone
Loss of network connectivity to primary
Compute unit failure on primary
Storage failure on primary
This infers that the monitoring is not located in the same AZ. Most likely, the read replica is using mysql functions (https://dev.mysql.com/doc/refman/5.7/en/replication-administration-status.html) to monitor the status of the master, and taking action if the master becomes unreachable.
Of course, this bears the question what happens if the replica AZ fails? Amazon most likely has checks in the replica's failure detection to figure out whether it's failing or the primary is.

Allocated storage has been exhausted - scale storage to resolve

I get the above message in my Amazon RDS instance alerts section. How do I get notified by email when such violations are reported by RDS.
The most straight forward option for monitoring Amazon RDS (and any other AWS service for that matter) is Amazon CloudWatch, which provides a reliable, scalable, and flexible monitoring solution that you can start using within minutes and specifically includes Alarms:
[...] Alarms can automatically initiate actions on your behalf, based
on parameters you specify. An alarm watches a single metric over a
time period you specify, and performs one or more actions based on the
value of the metric relative to a given threshold over a number of
time periods. The action is a notification sent to an Amazon SNS topic
or Auto Scaling policy. [...] [emphasis mine]
Amazon SNS supports notifications over multiple transport protocols in turn, amongst those Email and Email-JSON, see the respective FAQ What are the different delivery formats/transports for receiving notifications?:
[...] Customers can select one the following transports as part
of the subscription requests:
[...]
”Email”, “Email-JSON” – Messages are
sent to registered addresses as email. Email-JSON sends notifications
as a JSON object, while Email sends text-based email.
The metric in question is the FreeStorageSpace RDS metric (see Amazon RDS Dimensions and Metrics for details on the available ones) as discussed in Scaling DB Instance Storage:
Important
We highly recommend that you constantly monitor the
FreeStorageSpace RDS metric published in CloudWatch to ensure that
your DB Instance has enough free storage space. For more information
on monitoring RDS DB Instances, see Viewing DB Instance Metrics.
Accordingly, you'll need to create an alarm mirroring or approximating the threshold reported to you by AWS in the console, publish it to an SNS topic and subscribe to this topic via an email address of your choice.

Resources