Fixed internal ips in azure - azure

We are currently evaluating azure, to see if we can use it for our stress and production environments.
Our environment is pretty complex, including web servers, mysql servers, hadoop and cassandra servers, as well as monitoring and deployment servers.
To set the stress environment, we need to install the environment, and then load large amounts of data into it, before we can run a stress test. This takes time and effort, and so, since we pay by the hour, we would like to be able to completely shutdown the environment, and start it up again ready to go when we want to run additional stress tests.
Here's our issue - we could not find a way to set a fixed internal ip address for a vm in azure. In AWS it is possible with VPC, but in azure, even if you define a virtual network, there seems to be no way to set a fixed internal ip (at least none that we can find).
This creates several issues for us -
1. Hadoop relies on all nodes in the cluster being able to translate all the modes hostnames to ip addresses.
2. A cassandra cluster that has all the ip addresses in the cluster change at once freaks out. We actually lost data in a test cassandra cluster because of this.
Our questions are:
1. Is there a way to set a fixed internal ip for a vm in azure?
2. If not, did anyone have an experience with running hadoop and cassandra on azure? How did you handle the changing ip addresses when the cluster is shut down?
Any advice on these issues will be much appreciated,
Thanks
Amir

Please note that the portal doesn't always expose all the capabilities of Azure. Some of the features in Azure are only possible through the REST API and PowerShell.
If you take a look at the new release of the PowerShell Cmdlets, you'll notice there is a new option for Static IPs in VNets.
https://github.com/WindowsAzure/azure-sdk-tools

Related

How to create and manage floating IP for a local highly available cluster?

I currently have a highly available cluster for multiple services of my application. The cluster is working without any problem on AWS and now I want to replicate and adapt the whole structure within a local network.
I use Pacemaker/Corosync to share AWS Elastic IP between two HAProxy instances. But I'm not sure if it is possible to create the same flow within my local network since I don't know how to share a single local IP between two of the computers.
Is it possible to manage a single local IP as a floating IP within local network?
Have a look at the HAPROXY with VRRP and Keepalived setup. I will have to do a test in my homelab if you need configs.

Clarification on how availability sets make a single VM more available

I am having difficulty understanding Azure Availability sets, specifically, what exactly i need to to do ensure my app running on my vm is utilizing Availability sets to be more available.
Lets say i am creating an application that runs on a single VM and i want to make it more resistant to hardware failure.
Option 1:
I create an Availability Set with 2 fault domains and then create a VM on this Availability set.
Is that it?
If there is a hardware failure on the rack hosting my VM, does azure now take care of ensuring the VM stays up and running?
Option 2:
i have to have two servers Vm1 & Vm2, both in the availability set but one on fault domain 1, one on fault domain 2.
i have to then set up a cluster of sorts for my application. In this case the availability set is simply allowing me to be sure that the two servers in my cluster are not on the same hardware, but the plumbing to ensure the application can take advantage of two servers and is highly available is still down to me.
Is option 1 or option 2 the correct way in which Availability Sets work in relation to fault domains?
Appreciate any clarity that can be provided.
Azure deals with hardware failure in two ways, Availability Sets and Availability Zones. AS is all about making sure that your app does not go down even if hardware failure happens within a Data center aka Zone itself. AZs are all about making sure your app does not go down even if the whole data center aka Zone is down. More details here.
Now to understand best practices around availability take a look at the best practices, specifically for VMs can be found here.
A Single VM instance is defined as follows, reference:
"Single Instance" is defined as any single Microsoft Azure Virtual Machine that either is not deployed in an Availability Set or has only one instance deployed in an Availability Set.
So one VM in or not in an availability set does not make any difference, for this you need at least two VMs and which are in an AS using FDs and UDs so Azure will take care of this by making sure that both VMs are running on separate Hardware to avoid your app going down.
One VM in an Availability set is nearly as good as a VM with no Availability set.
If you are placing two or more VMs in an AS and those are identical then you can add a load balancer to distribute traffic.
You can also use AS without a Load balancer if you are not interested in traffic distribution. One scenario can be where you want to switch to a secondary VM only when primary is unavailable.
Also, do understand it is not required to have identical VMs in an AS.
Virtual machine scale set is a good option if you are looking for a high availability solution with VMs.

Docker: How to deal with restarted nodes?

If a docker enabled VM is restarted, e.g. due to Azure patching the VM or for whatever reason, the node can get a new IP address (VirtualBox can cause this, and Azure too)
Which in turn results in the cert no longer being valid and Docker fails to start on that machine.
If I use Docker Swarm, the result is that the restarted node will be stuck in status Pending indefinitely.
If I then do a docker-machine regenerate-certs mymachine then it starts working again.
How should I reason around this?
I guess there is no way around having nodes being restarted, so how do you deal with this?
Regarding Azure you can ensure your VM keeps its public IP address after restart by using "Reserved IP" addresses. Please note using reserved IPs on Azure (as with other cloud providers) may incur additional charges. https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-reserved-public-ip/
Another way to handle this is using discovery. Swarm offers a discovery mechanism which support etcd, consul and zookeeper. Find more details here:
https://docs.docker.com/swarm/discovery/

Recommended replica set config in Azure

We're running MongoDB on Azure and are in the process of setting up a production replica set (no shards) and I'm looking at the recommendations here:
http://docs.mongodb.org/ecosystem/tutorial/install-mongodb-on-linux-in-azure/
And I see the replica set config is such that the members will talk to each other via external IP addresses - isn't this going to 1) incur additional Azure costs since the replication traffic goes thru the external IPs and 2) incur replication latency because of the same?
At least one of our applications that will talk to Mongo will be running outside of Azure.
AWS has a feature where external DNS names when looked up from the VMs resolve to internal IPs and when resolved from outside, to the external IP which makes things significantly easier :) In my previous job, I ran a fairly large sharded mongodb in AWS...
I'm curious what your folks recommendations are? I had two ideas...
1) configure each mongo host with an external IP (not entirely sure how to do this in Azure but I'm sure it's possible...) and configure DNS to point to those IPs externally. Then configure each VM to have an /etc/hosts file that points those same names to internal IP addresses. Run Mongo on port 27017 in all cases (or really whatever port). This means that the set does replication traffic over internal IPs but external clients can talk to it using the same DNS names.
2) simiilar to #1 but run mongo on 3 different ports but with only one external IP address and point all three external DNS names to this external IP address. We achieve the same results but it's cleaner I think.
Thanks!
Jerry
There is no best way, but let me clarify a few of the "objective" points:
There is no charge for any traffic moving between services / VMs / storage in the same region. Even if you connect from one VM to the other using servicename.cloudapp.net:port. No charge.
Your choice whether you make the mongod instances externally accessible. If you do create external endpoints, you'll need to worry about securing those endpoints (e.g. Access Control Lists). Since your app is running outside of Azure, this is an option you'll need to consider. You'll also need to think about how to encrypt the database traffic (mongodb Enterprise edition supports SSL; otherwise you need to build mongod yourself).
Again, if you expose your mongod instances externally, you need to consider whether to place them within the same cloud service (sharing an ip address, getting separate ports per mongod instance), or multiple cloud services (unique ip address per cloud service). If the mongod instances are within the same cloud service, they can then be clustered into an availability set which reduces downtime by avoiding host OS updates simultaneously across all vm's, and splits vm's across multiple fault domains).
In the case where your app/web tier live within Azure, you can use internal IP addresses, with both your app and mongodb vm's within the same virtual network.

automatic failover if webserver is down (SRV / additional A-record / ?)

I am starting to develop a webservice that will be hosted in the cloud but needs higher availability than typical cloud SLAs provide.
Typical SLAs, e.g. Windows Azure, promise an availability of 99.9%, i.e. up to 43min downtime per month. I am looking for an order of magnitude better availability (<5min down time per month). While I can configure several load balanced database back-ends to resolve that part of the issue I see a bottleneck at the webserver. If the webserver fails, the whole service is unavailable to the customer. What are the options of reducing that risk without introducing another possible single point of failure? I see the following solutions and drawbacks to each:
SRV-record:
I duplicate the whole infrastructure (and take care that the databases are in sync) and add additional SRV records for the domain so that the user tying to access www.example.com will automatically get forwarded to example.cloud1.com or if that one is offline to example.cloud2.com. Googling around it seems that SRV records are not supported by any major browser, is that true?
second A-record:
Add an additional A-record as alternatives. Drawbacks:
a) at my hosting provider I do not see any possibility to add a second A-record but just one... is that normal?
b)if one server of two servers are down I am not sure if the user gets automatically re-directed to the other one or 50% of all users get a 404 or some other error
Any clues for a best-practice would be appreciated
Cheers,
Sebastian
The availability of the instance i.e. SLA when specified by the Cloud Provider means the "Instance's Health is server running in the context of Hypervisor or Fabric Controller". With that said, you need to take an effort and ensure the instance is not failing because of your app / OS / or pretty much anything running inside the instance. There are few things which devops tend to miss and that kind of hit back hard like for instance - forgetting to configure the OS Updates and Patches.
The fundamental axiom with the availability is the redundancy. More redundant your application / infrastructure is more availabile is your app.
I recommend your to look into the Azure Traffic Manager and then re-work on your architecture. You need not worry about the SRV record or A-Record. Just a CNAME for the traffic manager would do the trick.
The idea of traffic manager is simple, you can tell the traffic
manager to stand after the domain name ( domain name resolution of the
app ) then the traffic manager decides where to send the request on
considerations of factors like Round-Robin, Disaster Management etc.
With the combination of the Traffic Manager and multi-region infrastructure setup; you will march towards the high availability goal.
Links
Azure Traffic Manager Overview
Cloud Power: How to scale Azure Websites globally with Traffic Manager
Maybe You should configure a corosync cluster with DRBD ?
DRBD will ensure You that the data on both nodes are replicated (for example website files and db files).
Apache as web server will be available under a virtual IP to which domain is pointed. In case of one server is down corosync will move all services to second server within few seconds.

Resources