RDS Cluster and DB Instance concept - amazon-rds

I need to create the RDS Aurora 5.7 database. I think I am not clear on the RDS concept. Is this the correct hierarchy? aws_rds_cluster -> aws_rds_cluster_instance -> aws_db_instance I should need to define all of the above since I kinda stuck on the configuration so I try to clarify the concept

A "classic" RDS instance is defined in Terraform as an aws_db_instance. This is either single-AZ or multi-AZ, but it defines the entire cluster and the instances that comprise the cluster. Since you want Aurora, this is not what you want based on your question.
You want an aws_rds_cluster which defines the entire cluster, then at least one aws_rds_cluster_instance which defines instances. The aws_rds_cluster_instance then defines which cluster it is a part of with the cluster_identifier argument.
Clusters provide the storage backend where your live data and automated backups reside. The global parameter group (parameters that must be the same among all instances using that storage backend) are set at this level as well.`
Instances are servers running a copy of MySQL with access to the storage backend. They have instance parameter groups which define parameters that are ok to be different between instances. Right now you can only have 1 writer instance per cluster plus multiple reader instances, although Amazon is working on multi-master which would allow multiple writer instances.
You can add/remove instances at will, but once you delete the cluster itself your storage (and all automatic snapshots!) go away. Take manual snapshots to keep copies of your data that will not disappear if the cluster is deleted.

Related

Terraform conditional creation

I have a Terraform file that creates a resource group, storage account, storage shares, database and VMs on Azure. My proposed use case is that in production once the resource group, storage account, storage shares and database are created database, they should stay in place. However, there are cases where the VMs may need to be destroyed and re-created with different specs. I know that I can run the file once to create everything and then taint the VMs and re-create them with an apply, but that doesn't seem like the ideal method.
In regular use, changes to infrastructure should be made by changes to configuration, rather than by running imperative commands like terraform taint. If you change something about the configuration of a VM that requires it to be created then the underlying provider should produce a plan to replace that object automatically, leaving others unchanged.
When you have different objects that need to change on different rhythms -- particularly when some of them are stateful objects like databases -- a good way to model this in Terraform is to decompose the problem into multiple separate Terraform configurations. You can use data sources in one configuration to retrieve information about objects created in another.
Splitting into at least two separate configurations means that the risk of running terraform apply on one of them can be reduced, because the scope of actions it can take is just on the objects managed in that particular configuration. Although in principle you can carefully review the Terraform plan to see when it's planning to make a change that would be harmful, splitting into multiple configurations is extra insurance that many teams use to reduce the possible impact of human error.

Amazon rds Aurora clone/restore point in time API

When I try to use clone/restore point in time from amazon console. It clones cluster as well as all the instances which belongs to that. But when I consume the same functionality using amazon API, it clones only cluster alone.
Is there any other API to clone cluster alone with their instances, security/parameter group and other settings?
Console adds a convenience layer where in it internally makes multiple API calls to make the experience better. Restoring from a snapshot or from point in time is done in 2 steps:
RestoreDBClusterFromSnapshot or RestoreDBClusterToPointInTime API - To create a new cluster, backed by a new distributed aurora volume. No DB instances are added when then API is issued.
CreateDBInstance API - To add instances to the cluster.
So in short, if you want to do it via CLI, you need to issue both these API calls. The same is true while creating a cluster with instances as well. Console would create a cluster and add instances in the same UX workflow, but behind the scenes, it is actually issuing a CreateDBCluster API followed by one or more CreateDBInstance API call(s).
Hope this helps.

Do Service Fabric singleton services have downtime when scaling?

We use a Service Fabric cluster to deploy stateless microservices. One of the microservices is designed as a singleton. This means it is designed to be deployed on a single node only.
But does this mean when we scale up or scale down the VM scale set (horizontal scaling) the service will be down? Or does the Service Fabric cluster take care of it?
There are two main concepts do keep in mind about services in service fabric, mainly but not limited to Stateful Services. Partitions and Replicas.
Partitions define the approach used to split the data into groups of data, they are defined as:
Ranged partitioning (otherwise known as UniformInt64Partition). Used to split the data by a range of integer values.
Named partitioning. Applications using this model usually have data that can be bucketed, within a bounded set. Some common examples of data fields used as named partition keys would be regions, postal codes, customer groups, or other business boundaries.
Singleton partitioning. Singleton partitions are typically used when the service does not require any additional routing. For example, stateless services use this partitioning scheme by default.
When you use Singleton for Stateful services, it assumes the data is managed as a single group, no actual data partition is used.
Replicas defined the number of copies a partition will have around the cluster, in order to prevent data-loss on a primary replica failure.
In summary,
If you use a Singleton partition, shouldn't be a problem if the number of replicas is at least 3.
That means, once one NODE gets updated, the replica hosted on that node will be moved to another node, if this replica being moved is a primary replica, it will be demoted to secondary, a secondary will be promoted to primary, and then the demoted replica will shutdown and replicated onto another node.
The third replica is needed in case a replica fails during an upgrade, then the third get promoted to primary.

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?
Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.
Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/
For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.

Is Azure Resource Manager equivalent to what kubernetes is for Docker

Can you think of Azure Resource Manager as the equivalent to what kubernetes is for Docker?
I think that the two are slightly different (caveat: I have only cursory knowledge of Resource Manager)
Azure Resource Manager lets you think about a collection of separate resources as a single composite application. Much like Google's Deployment Manager. It makes it easier to create repeatable deployments, and make sense of a big collection of heterogeneous resources as belonging to a single app.
Kubernetes is on the other hand turns a collection of virtual machines into a new resource type (a cluster). It goes beyond configuration and deployment of resources and acts as a runtime environment for distributed apps. So it has an API that can be used during runtime to deploy and wire in your containers, dynamically scale up/scale down your cluster, and it will make sure that your intent is being met (if you ask for three running containers of a certain type, it will make sure that there are always three healthy containers of that type running).

Resources