We are working on MarkLogic 9.0.8.2
We are setting up MarkLogic Cluster (3 VMs) on Azure and as per failover design, want to have 3 forests (each for Node) in Azure Blob.
I am done with Setup and when started ingestion, i found that documents are distributed across 3 forests and not stored all in each Forest.
For e.g.
i ingested 30000 records and each forest contains 10000 records.
What i need is to have all forest with 30000 records.
Is there any configuration (at DB or forest level) i need to achieve this?
MarkLogic does not work the same as some of the other noSQL document databases failover which may keep a copy of every document on each host.
The clustered nature of MarkLogic distributes the documents across the hosts to provide a balance of availability and resource consumption. For failover protection, you must create additional forests on each host and attach them to your existing forests as replicas. This ensures availability should any 1 of the 3 hosts fail.
Here is a sample forest layout:
Host 1: primary_forest_01 replica_forest_03
Host 2: primary_forest_02 replica_forest_01
Host 3: primary_forest_03 replica_forest_02
The replica forest must be on a different host than the primary forest, and if there are multiple forests per host, they should be striped across hosts to best balance out resource consumption when failed over.
It's also important to note that for HA, you need replicas configured for the system databases as well.
So there is no database setting to put all the documents on every hosts, because that is not the way MarkLogic is designed to work. The Scalability, Availability and Failover Guide is very informative, and in this case, the High Availability of Data Nodes with Failover section is particularly relevant. I also highly recommend checking out the free training that MarkLogic offers.
Related
We can have passive read-only asynchronous real-time sync-up for Azure SQL database, for disaster recovery.
But our requirement is to have real-time sync-up between both active read-write databases to provide low latency to customers in different locations of the world.
for example:
I'm providing e-commerce website, I will update data in one of the
database server and other connected databases in sync with this
database should get updates.
Users from different servers of the world will get connected to their
nearest data center for low latency. If someone buys something or puts
some review, it should get updated in all other databases. In this
way we need active-active database sync.
We explored multiple items on this, but did not find anything relative.
Can anyone please guide me on how to achieve this.
SQL Server has Peer-to-Peer Transactional Replication, but you need to ensure in the application that conflicting changes are not introduced on multiple nodes.
SQL Server also has Merge Replication, which allows updates at any subscriber, and supports custom conflict resolution.
These are both available on SQL Server VMs. Limited replication options are available on Azure SQL Database Managed Instance. Azure SQL Database also has Data Sync.
Azure Cosmos DB also supports Multi-Master.
In either case multi-master introduces significant cost/complexity. Often it's better to just have a single writable master with regional readable replicas. In that configuration the application needs to connect to the global master for writing, but can read from a local replica. For this pattern you can simply use Failover Groups.
I can't find any specific documentation that says what's the difference between primary node and a non-primary node, and how are they being used. Can somebody shed light on it? Thanks.
If you compare Service Fabric to other Orchestration Tools like Kubernetes, you will notice a small difference on how clusters are defined.
Kubernetes uses a concept of Master to host cluster management services, and Minion to host your application services(containers). Until version 1.1 it was not possible to run containers on the masters, because it had the idea that Master's should be isolated to avoid conflicting with containers running on it, like consuming too much memory, disk, cpu, and so on.
On Service Fabric this is a bit different. When you define a NodeType as Primary, what it means inside the cluster is that this NodeType will be responsible to host the Service Fabric Management Services(services needed to control the cluster health, orchestration and so on).
When you deploy a cluster via Azure Portal, depending on the durability tier (Bronze,Silver,Gold) you choose, it will require a certain number of nodes on Primary Node Type, to keep the cluster management healthy. For production workloads, 5 nodes the minimum recommended size for Primary NodeType or NonPrimary with stateful workloads in it. The minimum supported use VM SKU is Standard D1 or Standard D1_V2.
There is a catch for Primary Node-type, the change of VMSS Sku (Size) is not supported, you can do on your own risk, but is a recipe for disaster, because the risk of loosing management services is too high.
Non-primary NodeType, there is no overall difference other than these mentioned above. All NodeTypes will have a VMSS and a LoadBalancer(with an domain) being able to configure the access rules. All NodeType will have a limit of 100 nodes.
Compared to Kubernetes, SF does not add any constraints to prevent you deploying your services alongside the management services on primary nodes, Every node is part of a pool of resources(including the primary). So the default behaviour is deploy applications on every node available no matter the NodeType.
When you plan bigger clusters (100+ nodes), it is important that you take that in account, and isolate your Primary NodeType from your workloads, and remove the pressure on your management services nodes.
Having multiple node types can be useful in these situations:
You want to run services exposed to the internet & services not exposed. The first set would run on a node type (VMSS) attached to the Load Balancer and the second on a scale set that isn't.
You need to run services for certain customers on premium hardware and trials on cheaper hardware. The first set would run on nodes with lots of CPU, lots of RAM. The second on lower SKU's.
You want to build a cluster that exceeds the max node count that one VMSS can hold.
Or you need to add scale sets on the fly, to support huge growth.
And: The primary nodes run your system services, the secondaries don't.
There is not much of a difference. Nodes of different node types all share the same characteristics of a Service Fabric Cluster. They all participate in load balancing etc.
Except for one thing: system services run om the nodes of the primairy node type only (source):
Primary node type is where the system services run, so the VM SKU you choose for it, must take into account the overall peak load you plan to place into the cluster. Here is an analogy to illustrate what I mean here - Think of the primary node type as your "Lungs", it is what provides oxygen to your brain, and so if the brain does not get enough oxygen, your body suffers.
An important purpose of node types is to constraint service placement to specific node types. For example, you can have several node types, one uses VM's with higher cpu capacity and one with focus on amount of memory. The you can place memory resource hungry services on one node type and cpu intensive services on the other.
We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?
Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.
Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/
For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.
I tried to dig on MSDN but could not get concrete statement for which is the best load balancing method.
could someone please share some light on which of the below are best option for given scenario:
Performance
Failover
Round Robin.
Scenario:
x Web Roleshosted on Large VM on single data center.
Requirement:
must be 100% up 24x7.
Thank you.
First: Do you really want to offer a 100% uptime SLA for your customers, when Azure itself doesn't offer 100% in its SLA's?
That said: Traffic Manager only load-balances your compute, not your storage. So if you're trying to increase uptime by having a set of backup compute nodes running in another data center, you need to think about data access speed and cost:
With round robin, you'll now have distributed traffic across multiple data centers, guaranteed, and constantly. And if your data is in a single data center (which is a good idea to have data in a single System of Record, unless you have replication logic all taken care of), some of your users are going to see increased latency as the nodes separated from your data are going to be requesting data across many miles (potentially between continents). Plus, data egress has a $$$ cost to it.
With performance, your users are directed toward the data center which offers them the lowest latency. Again, this now means traffic across multiple data centers, with the same issues as round robin.
With failover, you now have all traffic going to one data center, with another designated as your failover data center (so it's for High Availability). In the event you have an outage in the primary data center, you'd now have a failover data center to rely on. This may help justify the added latency and cost, as you'd only experience this latency+cost when your primary app location becomes unavailable for some reason.
So: If you're going for the high availability route, to help approach the 100% availability mark, I'm guessing you'd be best off with the failover model.
Traffic manager comes into picture only when your application is deployed across multiple cloud services within same data center or in different data centers. If your application is hosted in a single cloud service (with multiple instances of course) , then the instances are load balanced using Round Robin pattern. This is the default load balancing pattern and comes to you without any extra charge.
You can read more about traffic manager here: https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-overview/
As per my guess there can not be comparison which is best load balancing method of Azure Traffic manager. All of them have unique advantages and vary depending on the requirement of application. Most common scenario is to use performance load balancing option with azure traffic manager. But as Gaurav said, you will have to have your cloud service application hosted on more than one cloud services. If you wish to implement performance load balancing then here is the link to get you started - http://sanganakauthority.blogspot.com/2014/06/performance-load-balancing-using-azure.html
I need to make sure the availability of my database is high. working with SQL Azure does not make that clear.
Is there a way to run multi servers (one will take over if one server fails? ) under SQL Azure, above that is there something equivalent to increasing memory on the DB server to speed up the Database processing ?
Read High Availability on the Intro the Azure SQL and then read Business Continuity in Windows Azure SQL Database. To summarize:
Data durability and fault tolerance is enhanced by maintaining
multiple copies of all data in different physical nodes located across
fully independent physical sub-systems such as server racks and
network routers. At any one time, Windows Azure SQL Database keeps
three replicas of data running—one primary replica and two secondary
replicas.
Right now there is no way to specify hardware configuration for SQL Azure Databases. It's totally out of your control and from SAAS perspective that makes sense. The backend management services are responsible making sure you get the best performance possible.
If you need dedicated and reserved hardware for your SQL deployment you may take a look at IAAS offerings in Azure and start a VM with SQL installed however you need to make sure you know the main differences between a IAAS and PAAS offering.
I do not know what your high availability requirements are, but you should look at the SLAs provided by Microsoft. SQL Database offers 99.9% monthly availability.