Pricing for AWS RDS cluster Snapshots - amazon-rds

I'm exploring the possibility to backup an Aurora Serverless cluster with AWS Backup, in order to cover a much more extended period than the 35 days of automated backups RDS offers. The aim is to get to 6 months to 1 year of daily cluster backups, depending on how much will it cost.
So far, I have an idea on how to set it up using CDK; what I miss is the costs.
I still have no clue about how the billing for cluster backup is calculated. For what I've seen, backup storage is 0,0021 $/GB per month in my region, and from the last billing I've taken from AWS, the cost for cluster backups totals to around 14$.
Which means that I have around 660 GB for "additional backup storage", but that doesn't seem right. Our daily snapshots are around 80-90 GB of size, so by a quick calculation, that would add to around 3100 GB of cluster size, which is far more than 660 GB. So where this discrepancy in price comes from?

It's a good question. My guess is that RDS snapshots store only diffs, the same as for EBS snapshots. But I'm not sure. Today I read Percona blog post https://www.percona.com/blog/aws-rds-backups-whats-the-true-cost/, their calculation IMHO overestimated.

Related

Copying large files(DB backups) 50GB+ to across the prod to dev environmetns

We would like to copy Db backups from Prod to dev environemtns on monthly basis. But the files size is around 50GB+. Is there any way we can do it quickly.
You can try using ‘Azcopy’ utility in this case as you can transfer/copy files from your Server’s drives to Azure Blob storage container or ADLS (Azure Data Lake Storage) Gen2 container. It transfers data at the maximum rate of 100 GB per hour in chunk size of 4 GB at a time when there is no capping of Internet bandwidth throughput limits.
To configure it for optimum performance, please use ‘Azcopy’ utility with ‘NC’ parameter as it means no concurrent requests are made to the Azcopy utility for file transfer. Also, ensure that the file size limit is at a time is set to less than 1TB at a time of transfer job.
Please refer the below link for more details on configuring the Azcopy utility for file transfer purposes and for optimal performance parameters: -
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-optimize#increase-concurrency
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-files

Too many connected disks to AKS node

I read that there is a limitation to the amount of data disks that can bound to a node in a cluster. Right now im using a small node which can only hold up to 4 data disks. If i exceed this amount i will get this error: 0/1 nodes are available: 1 node(s) exceed max volume count.
The question that i mainly have is how to handle this. I have some apps that just need a small amount of persistant storage in my cluster however i can only attach a few data disks. If i bind 4 data disks of 100m i already reached the max limit.
Could someone advice me on how to handle these scenarios? I can easily scale up the machines and i will have more power in my machine and more disks however the ratio disks vs server power is completely offset at that point.
Best
Pim
You should look at using Azure File instead of Azure Disk. With Azure File, you can do ReadWriteMany hence having a single mount on the VM(node) to allow multiple POD to access the mounted volume.
https://github.com/kubernetes/examples/blob/master/staging/volumes/azure_file/README.md
https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-file
https://learn.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
4 PV per node
30 pods per node
Thoses are the limits on AKS nodes right now.
You can handle it by add more nodes, and more money, or find a provider with different limits.
On one of those, as an example, the limits are 127 volumes and 110 pods, for the same node size.

planning for graphite components for big cassandra cluster monitoring

I am planning to setup a 80 nodes cassandra cluster (current version 2.1 but will upgrade to 3 in future).
I have gone though http://graphite.readthedocs.io/en/latest/tools.html which has list of tools that graphite supports.
I want to decide which tools to choose as listener and storage so that it could scale.
As a listener should i use the default carbon or should i choose graphite-ng ?
However as storage component, i am confused that whether default whisper is enough? Or should I look at ohter option (like Influxdata,cynite or some rdms db (postgres/mysql))?
As gui component i have finalized to use grafana for better visulization.
I think datadog + grafana will work fine but datadog is not opensource.So Please suggest an opensource scalable up to 100 cassandra nodes alternative.
I have 35 Cassandra nodes (different clusters) monitored without any problems with graphite + carbon + whisper + grafana. But i have to tell that re-configuring collection and aggregations windows with whisper is a pain.
There's many alternatives today for this job, you can use influxdb (+ telegraf) stack for example.
Also with datadog you don't need grafana, they're also a visualizing platform. I've worked with it some time ago, but they have some misleading names for some metrics in their plugin, and some metrics were just missing. As a pros for this platform, it's really easy to install and use.
We have a cassandra cluster of 36 nodes in production right now (we had 51 but migrated the instance type since then so we need less C* servers now), monitored using a single graphite server. We are also saving data for 30 days but in a 60s resolution. We excluded the internode metrics (e.g. open connections from a to b) because of the scaling of the metric count, but keep all other. This totals to ~510k metrics, each whisper file being ~500kb in size => ~250GB. iostat tells me, that we have write peaks to ~70k writes/s. This all is done on a single AWS i3.2xlarge instance which include 1.9TB nvme instance storage and 61GB of RAM. To fully utilize the power of the this disk type we increased the number of carbon caches. The cpu usage is very low (<20%) and so is the iowait (<1%).
I guess we could get away with a less beefy machine, but this gives us a lot of headroom for growing the cluster and we are constantly adding new servers. For the monitoring: Be prepared that AWS will terminate these machines more often than others, so backup and restore are more likely a regular operation.
I hope this little insight helped you.

Yearly disaster recovery Exercise in Cassandra

In order to address business requirement of yearly disaster recovery Exercise, any good suggestion for Cassandra setup in (3node-dc1)(3node-dc2) configuration?
The exercise is to simulate DR activation, but production workload still use DC1 to serve.
In the pace time, DC1 is the main DC to handle the workload, DC2 running spark analytics on Cassandra node only, no other workload.
Are you using the cloud (like AWS, Google Cloud Services) or are you running the database in dedicated hardware?
You mentioned 2 datacenters, are they part of the same cluster?
More than a special configuration to comply with your annual DR exercise, it would be better if you are prepared for any contingency:
have periodic and automated backups,
on our case, we take full daily snapshots, stored on S3, with expiration policies (only the latest 7 daily backups, last 4 weekly backups, last 3 monthly backups)
verify that the backups are able to be restored, and this is usually done on temporary AWS EC2 instances
tests or research in the restored instances do not communicate with the productive cluster, once that the test is done, the instances are terminated
For more detail, a coworker gave a talk for the Cassandra Summit 2016 with more detail about our process.

Are Amazon RDS instances upgradable?

Will I am able to switch (I mean upgrade or downgrade) Amazon RDS instance on need basis or do I have to create a new afresh and go through migration?
Yes, Amazon RDS instances are upgradeable via the modify-db-instance command. There is no need for data migration.
From the Amazon RDS Documentation:
"If you're unsure how much CPU you need, we recommend starting with the db.m1.small DB Instance class and monitoring CPU utilization with Amazon's CloudWatch service. If your DB Instance is CPU bound, you can easily upgrade to a larger DB Instance class using the rds-modify-db-instance command.
Amazon RDS will perform the upgrade during the next maintenance window. If you want the upgrade to be performed now, rather than waiting for the maintenance window, specify the --apply-immediately option. Warning: changing the DB Instance class requires a brief outage for your DB Instance."
RE: Outage Time: we have a SQL Server 2012 RDS Instance (1TB non IOPS drive), and going from an db.m1.xlarge to db.m3.xlarge (more CPU, less $$) incurred just over 4 minutes of downtime.
NOTE: We did the upgrade from the AWS console GUI and selected "Apply Immediately", but it was 10 minutes before the outage actually began. The RDS status indicated "Modifying" immediately after we initiated the update, and it stayed this way through the wait time and the outage time.
Hope this helps!
Greg
I just did an upgrade from a medium RDS instance to a large when we were hit with unexpected traffic (good, right? :) ). Since we have a multi-AZ instance, we were down for 2-3 minutes. In Amazon's documentation, they say that the downtime will be brief if you have a multi-AZ instance.
For anybody interested, we just modified an RDS instance (MySQL, 15 GB HD, rest of standard parameters) changing it from micro to small. The downtime period was 5 minutes.
RE: Outage Time: we have just upgraded postgresql 9.3 by immediately requesting following changes:
upgrading postgresql 9.3.3 to 9.3.6
instance resize from m3.large to m3.2xlarge
changing storage type to provisioned IOPS
extending storage from 200G to 500G (most expensive operation in terms of time)
It took us almost 5 hours to complete this whole operation. Database contains around 100G of data at moment of upgrade. You can monitor progress of your upgrade under Events section in RDS console. During upgrade RDS takes couple of backup snapshots, progress of those can be monitored under Snapsnots section.
We just did an upgrade from db.m3.large to db.m3.xlarge with 200GB of non-IOPS data running SQL Server 2012. The downtime was roughly 5 minutes.
Upgrading MySQL RDS from db.t2.small to db.t2.medium for 25G of data took 6 minutes.
On multi-az, there will be a failover, but otherwise it will be smooth.
Heres the timeline data from my most recent db instance type downgrade from r3.4xlarge to r3.2xlarge on a Multi-Az configured Postgres 9.3 with 3TB of disk( actual data is only ~800G)
time (utc-8) event
Mar 11 10:28 AM Finished applying modification to DB instance class
Mar 11 10:09 AM Multi-AZ instance failover completed
Mar 11 10:08 AM DB instance restarted
Mar 11 10:08 AM Multi-AZ instance failover started
We had a Alter statement for a big table( around 53 million records) , and it was not able to complete the operation.
The existing size usage was 48GB.
We decided to increase the allocated Storage in AWS - RDS Instance
The whole Operation took 2 hours to complete
MYSQL
db.r3.8xlarge
from 100G to 200G
The Alter statement took around 40 min but it worked.
Yes, they're upgradable. Upgraded RDS instance from SQL Server 2008 to SQL Server 2012 for instance size of about 36 GB, class db-m1-small, storage 200 GB and with no IOPS or Multi AZ. There was no downtime, this process barely took 10 minutes.

Resources