How to design Azure HDInsights Cluster

How to design Azure HDInsights Cluster - azure

I have a query on AZURE HDInsights. How do I need to design AZURE HDInsights Cluster according to my on-premises infrastructure ?
What are the major parameters which I need to consider before designing the cluster ?
(For Example) If I have 100 servers running on-premises, how many nodes I need to select in my Cloud Cluster like that. ?!! In AWS we have EMR sizing calculator and Cluster Planner/Advisor. Do we have anything similar planning mechanism in AZURE apart from Pricing Calculator ? Please clarify and provide your inputs. With Any example will be really great. Thanks.

Before deploying an HDInsight cluster, plan for the desired cluster capacity by determining the needed performance and scale. This planning helps optimize both usability and costs. Some cluster capacity decisions cannot be changed after deployment. If the performance parameters change, a cluster can be dismantled and re-created without losing stored data.
The key questions to ask for capacity planning are:
In which geographic region should you deploy your cluster?
How much storage do you need?
What cluster type should you deploy?
What size and type of virtual machine (VM) should your cluster nodes use?
How many worker nodes should your cluster have?
Each and every question is addressed here: "Capacity planning for HDInsight Clusters".

Related

Is there a way I can share data stored in Hazelcast server members(A,B,C) in AKS cluster 1 with Hazelcast Server members (D,E,F) in AKS cluster 2

I have 3 nodes (members) of Hazelcast server and 3 nodes (members) of Hazelcast client running in Kubernetes cluster of east region.
I have 3 nodes (members) of Hazelcast server and 3 nodes (members) of Hazelcast client running in Kubernetes cluster of west region.
My use case is to store data in both east and west region kubernetes clusters so that if any of the region is down, we can get data from another.
I am using Azure Kubernetes service and namespace name is same in both region clusters.
Any help will be appreciated. Thanks

This is the use case that the WAN replication feature was designed for; it provides disaster recovery capabilities by replicating the content of a cluster (either completely, or just selected IMaps) to a remote cluster. WAN replication is an Enterprise feature, so it isn't available in the Open Source distribution.
If you're trying to do something similar in Open Source, you could write MapListeners that observe all changes made on one cluster and then send the changes to the remote cluster. The advantage of WAN replication (other than not having to write it yourself) is that it has optimizations to batch writes together, configuration options to help trade off performance vs. consistency, and very efficient delta sync logic to help efficiently resynchronize clusters after a network or other outage.

What you cannot do when you use AKS over self-managed kubernetes cluster?

I'm deciding if I should provide use vanilla kubernetes or use Azure Kubernetes Service for my CI build agents.
What control will I lose if used AKS; SSH inside cluster? turning on and off the VMS? How about the cost, I see that AKS use the VM pricing, is there something beyond that

There are several limitations which come to my mind, but neither of them should restrict your use case:
You lose control over master nodes (control plane). Shouldn't be an issue in your use case, and I hardly imagine where this may be a limitation. You still can SSH into worker nodes in AKS.
You lose fine-grained control over size of worker nodes. Node pools become an abstraction to control size of the VMs. In a self-managed cluster you can attach VMs of completely different size to the cluster. In AKS all the nodes in the same pool must be of the same size (but you can create several node pools with different VM sizes).
It's not possible to choose node's OS in AKS (it's Ubuntu-based).
You're not flexible in choosing network plugins for k8s. It's either kubenet or Azure CNI. But that's fine as long as you're not using some weird applications which requre L2 networking, more info here
There are definitely benefits of AKS:
You're not managing control plane which is a real pain reliever.
AKS can scale its nodes dynamically, which may be a good option for bursty workloads like build agents, but also imposes additional delay during node scaling procedure.
Cluster (control and data planes) upgrades are just couple of clicks in azure portal.
Control plane is free in AKS (in contrast e.g. to EKS in Amazon), you pay only for the worker nodes, you can calculate your price here

Deploying apache Kafka on Azure

I read about, Event HUB, HDinsights and deploying Kafka on IaaS in Availability Set.
I need to know which are the requirements to implementing Kafka on AKS.
How can I know how many nodes needed? And also want to know how to calculate billing.
Finally to compare the three that I mentioned vs Kafka on AKS

The number of nodes depends on your requirements in terms of load, throughput and so on. If you are going to choose Apache Kafka on AKS I would suggest using the Strimzi project (https://strimzi.io/) to deploy it pretty easily. I also wrote a simple demo for a session I had about this (https://github.com/ppatierno/strimzi-aks).

How to make a HDInsight/Spark cluster shrink when idle?

We use Spark 2.2 on Azure HDInsight for ad hoc exploration and batch jobs.
The jobs should run ok on a 5x medium VM cluster. They are
1. notebooks (Zeppelin with Livy.spark2 magics)
2. compiled jars being run with Livy.
I have to remember to scale this cluster down to 1 worker when not using it, to save money. (0 workers would be nice, if that were possible).
I'd like Spark to manage this for me... When a Job starts, scale the cluster up to a minimum size first, then pause ~10 mins while that completes. After an idle period without Jobs, scale down again.

You can use PowerShell or Azure classic CLI to scale up/down the cluster. But you might need to write a script to track the cluster resource usage and scale down automatically.
Here is a powershell syntax
Set-AzureRmHDInsightClusterSize -ClusterName <Cluster Name> -TargetInstanceCount <NewSize>
Here is a PowerShell workflow runbook that will help you automate the process of scaling in or out your HDInsight clusters depending on your needs
https://gallery.technet.microsoft.com/scriptcenter/Scale-your-HDInsight-f57bb4d8
or
You can use the below option to scale it manually (even though your question is how to scale up/down automatically, I thought it would be useful to someone who wants to scale up/down manually)
Below is the link for an article explaining different methods to scale the cluster using PowerShell or Classic CLI (remember: the latest CLI does n't support scaling feature)
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices
If you want Spark to handle it dynamically, Azure Databricks is the best choice (but it is only Spark cluster, no Hadoop components (except Hive)). As HDInsight - Spark is not a Azure managed service, and will not solve your use case.
Below is the image of a new cluster (in Azure Data bricks) - I highlighted an "enable auto scaling option" which will allow you to scale dynamically when job is executed.

I'm told that Azure Databricks may be a better solution for this use case.

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?

Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.

No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.

Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/

For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string