Cassandra auto scaling

Cassandra auto scaling - cassandra

Want to deploy datastax Cassandra cluster(Multiple DC's - in multiple regions) in google cloud compute engine with auto scale capability.
I have deployed datastax cassandra cluster in google cloud but not sure how to configure auto scaling capability .
In google cloud compute engines auto scaling is only possible with managed instances group. I.e. all the instances should be in a particular region. Since cassandra cluster deployed in multiple regions, do we need maintain separate auto scaling(managed instance template) feature for each region.
Do we need to use google cloud auto scaling concept or datastax has its own in build one?
How to add new VM's to cluster if we provision a VM's based on google cloud managed instance group concept.
Thanks,

You need to maintain separate auto scaling.
And Let Google Cloud maintain the auto scaling because the machines that need to be added are obviously handled by Google Cloud. DataStax Enterprise has no way of doing it automatically.
I don't know each and every step (for Google Cloud) but when you spin up a machine,you can keep a script to run which starts Cassandra with some set of 'seeds'.

Related

Options for setting up and starting cassandra for production

I have a business case for which we choose to pick Cassandra as NOSQL-DB, But we are stuck with aspect of setting up Cassandra, Any insights over what are the setup options available and what to choose is appreciated.
As of now the options i knew are
1)installing Cassandra on ec2 instance(which i believe is not a production ready option)
2)using a AWS managed Cassandra service
Is there any other ways, Please shed some light on this.

Not sure where you got the information from but they're not correct.
Thousands of companies have Cassandra deployed in production not just on EC2 instances but also GCP, Azure and other public clouds. It is also possible to deploy Cassandra on your own premises, private clouds and even hybrids -- any combination of on-premise + public cloud + private cloud.
If you don't have experience with installing/managing a Cassandra cluster, you can try Astra DB which is a Cassandra-as-a-service running on AWS, GCP and/or Azure. There's a tier that's free forever with no credit card required. It only takes a few clicks to launch a cluster. Cheers!

ElasticSearch on Azure - VM Availability Set vs Scale Set

We've deployed an ElasticSearch cluster via Azure Marketplace (the "Self-Managed" flavor) it deploys the cluster as a VM Availability Set. However, we want to be able to scale up the number of data nodes when needed, similar to how we can do with our other VM Scale Sets (very easy to accomplish in Azure). Reading the article here: https://www.elastic.co/blog/deploying-elasticsearch-on-microsoft-azure (see the last paragraph of the "Availability" section), it even mentions taking this approach, although it doesn't give instructions on how to accomplish this (the Scale Sets link just leads to a general description of VM scale sets).
Does anyone know how to get ElasticSearch set up in Azure as a Scale Set instead of Availability Set?

You are looking for running ElasticSearch on Azure VMSS.
It is not a trivial task as you need gracefully add/remove nodes into Elastic Search cluster.
There is a template that allows to install Elasticsearch cluster on a Virtual Machine Scale Set.
https://azure.microsoft.com/en-us/resources/templates/elasticsearch-vmss/

Highly available, redundant Redis-cluster over kubernetes

The objective is to create a highly available redis cluster using kubernetes for a nodeJS client. I have already created the architecture as below:
Created a Kubernetes cluster of Kmaster with 3 nodes (slaves).
Then I created statefulsets and persistent volumes (6 - one for each POD).
Then created Redis pods 2 on each node (3 Master, 3 replicas of respective master).
I need to understand the role of Redis Sentinel hereafter, how does it manage the monitoring, scaling, HA for the redis-cluster PODs across the nodes. I understand Sentinel should be on each node and doing its job but what should be the right architecture here?
P.S. I have created a local setup for now, but ultimately this goes on Azure so any suggestions w.r.to az is also welcome.
Thanks!

From an Azure perspective, you have two options and if you are very specific to option two but are looking for the Sentinel architecture piece, there is business continuity and high availability options in both IaaS (Linux VM scale sets) and PaaS services that go beyond the Sentinel component.
Azure Cache for Redis (PaaS) where you choose & deploy your desired service tier (Premium Tier required for HA) and connect your client applications. Please see: Azure Cache for Redis FAQ and Caching Best Practice.
The second option is to deploy a solution (as you have detailed) as an IaaS solution built from Azure VMs. There are a number of Redis Linux VM images to choose from the Azure Marketplace or there is the option to create a Linux VM OS image from your on-premise solution and migrate that to Azure. The Sentinel component is enabled on each server (master, slavea, and slaveb, ...). There are networking and other considerations too. For building a system from scratch, please see: How to Setup Redis Replication (with Cluster-Mode Disabled) in CentOS 8 – Part 1 and How to Setup Redis For High Availability with Sentinel in CentOS 8 – Part 2

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?

Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.

No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.

Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/

For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.

Is it possible to deploy an application using cassandra database on Windows Azure?

I recently got a trial version of Windows Azure and wanted to know if there is any way I can deploy an application using Cassandra.

I can't speak specifically to Cassandra working or not in Azure unfortuantly. That's likely a question for that product's development team.
But the challenge you'll face with this, mySQL, or any other role hosted database is persistence. Azure Roles are in and of themselves not persistent so whatever back end store Cassandra is using would need to be placed onto soemthing like an Azure Drive (which is persisted to Azure Blob Storage). However, this would limit the scalability of the solution.

Basically, you run Cassandra as a worker role in Azure. Then, you can mount an Azure drive when a worker starts up and unmount when it shuts down.
This provides some insight re: how to use Cassandra on Azure: http://things.smarx.com/#Run Cassandra
Some help w/ Azure drives: http://azurescope.cloudapp.net/CodeSamples/cs/792ce345-256b-4230-a62f-903f79c63a67/
This should not limit your scalability at all. Just spin up another Cassandra instance whenever processing throughput or contiguous storage become an issue.

You might want to check out AppHarbor. AppHarbor is a .Net PaaS built on top of Amazon. It gives users the portability and infrastructure of Amazon and they provide a number of the rich services that Azure offers such as background tasks & load balancing plus some that it doesn't like 3rd party add-ons, dead-simple deployment and more. They already have add-ons for CouchDB, MongoDB and Redis if Cassandra got high enough on the requested features I'm sure they could set it up.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string