I have an Azure Event Hubs in WEST US with Geo-Recovery enabled to sync with EAST US. I don't want to automatically failover to EAST US in case of the Primary outage, can we disable the automatic failover? I want to manually do the failover post business confirmation.
There is no automatic failover from Microsoft end when you have set up the Geo-disaster recovery for the event hub. You need to manually do the failover either from azure portal/REST API's. The failover can also be automated based on your business scenario where your custom application will monitor the resource and based on your business conditions your custom application will be calling the event hub REST endpoint to initiate the failover.
Related
I have an Azure Event Hubs in WEST US with Geo-Recovery enabled to sync with EAST US. As per the Microsoft article -
The feature enables instant continuity of operations with the same
configuration, but doesn't replicate the messages held in queues or
topic subscriptions or dead-letter queues.
In case of a manual failover, is there way to copy the data from the Primary region to secondary region that is not yet consumed? if not, any reason why the unread data can be copied over?
Event Hub Geo-disaster is only for metadata disaster recovery and not for events in event hub disaster recovery. Geo-disaster is mainly used in the scenario when you don't want to update the connection string in your different applications in case of disaster and saves your time creating the new namespace with all the event hub/configurations and updating the new connection string.
We have our Azure API management is provisioned in East US and our hot-standby region is West US.
I know that we can take the backup of the source APIM and restore it on destination APIM. However I want to have a hot-standby running parallel without serving until a DR situation.
How do I configure the Azure API management to support the disaster recovery with automatic fail over?
Would it impact the configured URLs/domains?
I'm working in IoT enterprise application, where we have created all resources in South Central US. Recently (9/4/18) I noticed South Central US was down for long business hours due to any reason.
Now I'm trying to find best possible solution for high availability when a complete region down.
We are using following Azure resources.
EventHub (telematic data ingestion)
Azure Functions (EventHub, CosmosDB, ServicesBus Trigger)
Web App & WebJob (Schedule and continuous)
ServiceBus (Queue & Topic)
Application Insight (Application logs)
Storage Account (EventHub checkpointing and other data)
Cosmos DB
VSTS (CI/CD)
For Cosmos DB I know the solution, what should I do for other resources?
I don't see any way to create EventHub or ServiceBus multi-region cluster.
There's no cluster arrangement for ServiceBus & EventHub but can set up a fail-over flow for both.
Please refer to these articles on MS Docs:
Azure Event Hubs Geo-disaster recovery
Best practices for insulating applications against Service Bus outages and disasters
Let me know if that helps!
Azure provides Availability Zones and Geo Disaster Recovery support for both Service Bus and Event Hubs.
Here is the link for Availability Zones for Service Bus and Event Hubs
For Geo Disaster Recovery, look into Service Bus DR, Event Hubs DR
Do Logic Apps have some sort of built in geo-replication like the Azure Scheduler or Key Vaults? I can't seem to find any information about it.
I have seen some implementations using API management but that is for Logic App that use HTTP triggers, in my case I'm using Service Bus triggers.
If there is no geo-replication how would a disaster recovery implementation look like for my scenario?
I think you are asking three questions - How do I get a geo-redundant Logic Apps deployment and How do I get a geo-redundant Service Bus Messaging deployment and how do I use them in combination.
I would start with the Service Bus Messaging side as it is the foundation for the LA process. In order to have a geo-redundant Service Bus Messaging queue you have to use the Premium SKU and this article goes into detail on how it works: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-geo-dr
For the Logic Apps side you would setup an LA in each region (primary and secondary) and point the Logic Apps to the alias for Service Bus Queue. You would then disable the Logic App in the secondary region and only enable it when the primary region's Logic App was not operational. This would have to be done with some endpoint monitoring scripting and then switch over to the secondary and disable the primary.
Like you said, there are other more automated options (Traffic Manager) when Logic Apps is being triggered by HTTP traffic but since you are reading queues the recovery is more complex.
I'm currently building a hybrid-cloud solution that needs to write messages to a queue for later processing. It is absolutely imperative that the queue is highly available (99.999+% uptime).
My options are to read/write messages to a local ZeroMQ high availability pair, or an Azure Service Bus. I would prefer to go the Azure Service Bus route, but can't find any documentation regarding high availability configuration for Azure Service Bus.
Has anyone had success setting up Azure Service Bus for high availability? I understand that the SLA for a single instance of any Azure service cannot be changed. I'm thinking more along the lines of the failover capabilities of Azure Web Apps.
The main thing you can do for consuming a service at a higher than SLA value is to ensure you are handling retry logic. The key here will be the temporal nature of any outage, and tuning a retry backoff to handle edge cases. Some use linear or exponential backoffs to wait even longer for the service to come back up.
Also, you can have more than one service bus in a different region for georedundancy, and either load balancing messages across the two or use one as a hot backup. This can get you around any regional outages and keep your service up when one data center is not meeting its local SLA.
You can find the for SLA for Azure Service Bus here: legal/sla/service-bus/v1_0/
For Service Bus Relays, we guarantee that at least 99.9% of the time,
properly configured applications will be able to establish a
connection to a deployed Relay. For Service Bus Queues and Topics, we
guarantee that at least 99.9% of the time, properly configured
applications will be able to send or receive messages or perform other
operations on a deployed Queue or Topic. For Service Bus Basic and
Standard Notification Hub tiers, we guarantee that at least 99.9% of
the time, properly configured applications will be able to send
notifications or perform registration management operations with
respect to a Notification Hub. For Event Hubs Basic and Standard
tiers, we guarantee that at least 99.9% of the time, properly
configured applications will be able to send or receive messages or
perform other operations on the Event Hub.
We've had Service Bus Relay up and running for 5+ years and have had one outage. It was an outage at the specific data center the relay was provisioned in and touched many services. After that we implemented redundancy by implementing a secondary Service Bus Relay namespace in a different data center location. The reconfigured code was set to check the connectivity on every connection and switch the primary and secondary connections. We treated them as equals so once we "failed over" that namespace would become primary.
Service Bus now supports Geo-disaster recovery and Geo-replication at the namespace level.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-geo-dr